Big Data, Mashups, & Geographic Information Systems: What They Mean for Business

Utah Association for Career and Technical Education 2/16
http://uacte.org/events/

Why Care

It helps us with things like:

Yeah, yeah, this isn't making me care all that much. This could be a long hour.

How about if we use an example instead?

I am reading the Salt Lake Tribune and come across this gem of information:

Utah's Latino population soared by 78 percent during the past decade —growing by nearly 157,000 people to 358,000 total...three of every 10 new Utahns who arrived by birth or immigration during the decade were Latino. (1)

And it has the following visual:

WOW, I think to myself, what I need to do is make some spanish-language advertising and get it out now, particularly in San Juan!

Big Data

The term “big data” refers to a confluence of factors, including the nearly ubiquitous collection of consumer data from a variety of sources, the plummeting cost of data storage, and powerful new capabilities to analyze data to draw connections and make inferences and predictions.

Learn more:

Examples from recent news:

Let's shift back to our spanish advertising needs based on that Salt Lake Tribune article we just read above. For my advertising I am wondering about the hispanic population in Utah.

First, I check out the U.S. Census Bureau since they are a pretty darn reliable source:

Hispanic or Latino, percent, 2014:
Utah 13.5%
USA 17.4% (2)

Great! But then I check out San Juan county based on above:

Hispanic or Latino, percent, 2014: 5.1% (3)

Wait...what?

I look further at San Juan county:

White alone, percent, 2014: 50.0% (3)

What is going on? Well...

American Indian and Alaska Native alone, percent, 2014: 46.6% (3)

Ok, so I mis-interpretted the above news article. I can see how that might happen. No problem, maybe I just advertise in Spanish across all of Utah instead.

Next, I check out http://www.utahhcc.com/hispanic-res/demographics and feel pretty good about this decision to make Spanish advertising across Utah.

Well, before we hit the press, let's dive a wee bit deeper: http://www.pewhispanic.org/files/states/pdf/UT_11.pdf (languages spoken as an example, or age)

There has to be more to this story! And there is... check out

http://www.pewhispanic.org/states/state/ut/ to see some data from many sources combined. How does this drive our thinking?

Caveat: What is hispanic anyway?

All right, then, what about if we go with Spanish speakers instead of the hispanic tag?

Consider, now, how the above projections were made. How would you make them if it were up to you?

Well, what about income, education, age, family size or other stuff? Could that play a role? Where might I find some of that data?

Mashups

Mashups currently come in three general flavors: consumer mashups, data mashups, and business mashups:

Examples:

Do any of the following numbers potential interest us? Why or why not? What else might we want to know data-wise? Why?

Not all data, however, has the same level of statistical validity:

Ok, so if I mash up data from a bunch of different sources I can learn more, assuming I ask the right questions, get accurate data sets, and query properly. If not, I'll have mush instead.

Geographic Information Systems

Sometimes visual images and mapping of data helps us understand pictures a little better, at least for some of us. This is where stuff like heat maps and GIS information comes in to play.

For starters, lets check: http://www.census.gov/quickfacts/map/RHI725214/49,00

Or try some interactives relating to Hispanics at http://www.pewhispanic.org/category/interactives/pages/3/

Or even some visual diagrams from the U.S. Census Bureau:

\

But now, taking downloadable Census information from above, lets stick it in ArcGIS and run some of our own data. [Note: Below made with the help of Dr. Michael Bunds]

Let's consider the hispanic population again from data supplied by the U.S. Census Bureau last time around (2010):





 

Now that we are able to control not only the data coming in but what is included and how it is mapped, what would you ask or add? Why?

Understanding, of course that

  1. 2010 was a long time ago
  2. Self-definitions play a role
  3. Many factors play a role, often not one (such as self-identification as being Hispanic)
  4. Some data is more accurate than others (bias, error, etc)
  5. Attempting to understand, however, often can lead to new insights and ideas, which are usually good in the long run

The Future

Our use of massive amounts of data from a massive amount of sources which is accessible and assessible in a massive amount of ways is not going to go away. The question becomes, then, how it is handled by us as individuals, businesses and social entities.

http://www.crn.com/news/applications-os/video/300079562/ibm-big-data-is-the-means-to-artificial-intelligence.htm

This new world of data, and how companies can harness it, bumps up against two areas of public policy and regulation: 1) employment and 2) privacy (8) What would you add to the list?

References

(1) http://archive.sltrib.com/story.php?ref=/sltrib/home/51307553-76/percent-utah-latino-population.html.csp

(2) http://quickfacts.census.gov/qfd/states/49000.html

(3) http://quickfacts.census.gov/qfd/states/49/49037.html

(4) United States Census Bureau

(5) Pew Research Center’s Hispanic Trends Project

(6) U.S. Hispanic Leadership Institute

(7) U.S. Hispanic Chamber of Commerce

(8) https://www.technologyreview.com/s/538916/big-data-and-the-future-of-business/

Dr. Anne M. Arendt