Financial and Monetary Systems

The curious truths of big data

Bernard Marr

In the world of big data, strange truths about the world begin to emerge. Orange cars are the most reliable used cars to buy. Prepaid phone card sales can predict unrest in Africa. And women with larger breasts spend more money online.

That last one comes from a recent study released by Alibaba, the Chinese website that hopes to be the next Amazon. Data analysts looking at data points for ladies’ underwear sales noticed that women who purchased larger bra sizes spent more online overall.

But is that knowledge useful? Maybe, maybe not.

Correlation does not equal causation.

If you ever took a science class in school, you might have heard the phrase, “Correlation does not equal causation.” It basically tells us that just because women who purchase larger sized bras spend more money online, that doesn’t mean that their larger bra size caused them to spend more money.

And that can be the problem when data analysts are looking at these strange and interesting new truths that emerge from the mass quantities of data to which we now have access. If we take it as true that orange used cars are more reliable, the question then becomes why: Are owners of orange cars more careful? Does the color prevent people from getting in accidents? Or does the color orange have some other magical property that keeps a car running well? The data has no answers.

Tyler Vigen posts funny charts to his website, Spurious Correlations, that show the danger of simply matching two data sets without any deeper understanding of how the things are related. For example, if correlation is all you need to go by, then we can assume that the more films Nicolas Cage appears in in any given year, the more swimming pool drownings will result and that an increase in U.S. spending on science results in an increase of suicides by hanging. Spurious indeed, we hope, or U.S. researchers and Nick Cage’s film career are in trouble.

The data-driven crystal ball.

Now that we have all this data, we’re just on the cusp of figuring out how to use it to our advantage. The goal is to be able to use these strange truths to try to predict everything from buying habits to the spread of the flu virus, and the results are just as varied.

Researchers have realized that Twitter updates can more quickly and more accurately predict flu outbreaks than traditional CDC tracking methods — in fact, Twitter data can predict an outbreak up to 8 days in advance with more than 90 percent accuracy.

The African company CellTel realized a similar prediction ability when it noticed an uptick in prepaid phone cards before major incidents of violence and unrest in Congo. They realized that the cards were denominated in U.S. dollars, and people bought them to have something portable and valuable to take with them and protect against local inflation.

Similarly, Alibaba hopes to use the incredible quantities of data it collects (as many as 14 million data points in a single day) to predict factors in a huge variety of businesses it may try.

“For example, if we have a lot of data on what people purchase in terms of food, groceries, is that data going to be helpful when we do healthcare? I think so,” an executive told online magazine Quartz.

As more companies try to use their data to predict consumer behavior, don’t be surprised to see more of these curious truths emerge. Facebook, of course, has an entire team dedicated to data science, and they frequently post their findings to their Facebook page, like the fact that if your name is Yvette, you are more than 37 percent more likely than the average person to have a sister named Yvonne.

How that helps Facebook’s business plan is yet to be seen.

This article is published in collaboration with LinkedIn. Publication does not imply endorsement of views by the World Economic Forum.

To keep up with Forum:Agenda subscribe to our weekly newsletter.

Author: Bernard Marr is a globally recognized expert in strategy, performance management, analytics, KPIs and big data.

Image: Internet LAN cables are pictured in this photo illustration taken in Sydney June 23, 2011. REUTERS/Tim Wimborne.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Stay up to date:

Innovation

Share:
The Big Picture
Explore and monitor how Innovation is affecting economies, industries and global issues
A hand holding a looking glass by a lake
Crowdsource Innovation
Get involved with our crowdsourced digital platform to deliver impact at scale
World Economic Forum logo
Global Agenda

The Agenda Weekly

A weekly update of the most important issues driving the global agenda

Subscribe today

You can unsubscribe at any time using the link in our emails. For more details, review our privacy policy.

Climate adaptation finance: The challenge for institutional investors and commercial banks

Matthew Cox and Luka Lightfoot

November 22, 2024

What is the gig economy and what's the deal for gig workers?

About us

Engage with us

  • Sign in
  • Partner with us
  • Become a member
  • Sign up for our press releases
  • Subscribe to our newsletters
  • Contact us

Quick links

Language editions

Privacy Policy & Terms of Service

Sitemap

© 2024 World Economic Forum