Why big data needs better questions

Elizabeth Sabet
Associate, SecondMuse

The term “big data” is much in the news lately – alternatingly touted as the next silver bullet potentially containing answers to myriad questions on natural and human dynamics, and dismissed by others as hype.  We are only beginning to discover what value exists in the vast quantities of information we have today, and how we are now capable of generating, storing, and analyzing this information. But how can we begin to extract that value?  More importantly, how can we begin to apply it to improving the human condition by promoting development and reducing poverty?

That is precisely the question that motivated the World Bank Group and Second Muse to collaborate on the recently released report Big Data in Action for Development. Interviews with big data practitioners around the world and an extensive review of literature on the topic led us to some surprising answers.

Good questions help define scope of analysis, identify key behaviors
It is a common assumption that in order to engage effectively with big data, you have to start with the data itself and let them “speak”.  It turns out, most practitioners disagree.  We heard time and again from experts in the field that any work with big data must begin first with questions.  As opposed to being led by whatever dataset is available, starting with questions allows practitioners to define the setting and scope of their analysis and identify the behaviors or conditions in the world that interest them.  Questions help practitioners determine why they are seeking data and identify the media generating the data relevant to their purpose and scope.

In Big Data in Action for Development, we note that the purpose of most big data projects fall into three related categories: awareness, understanding, and forecasting. To share a few examples:

  • Real-time information and awareness regarding the extent of the damage resulting from Typhoon Haiyan in the Philippines provided insight into the optimal direction of response efforts, while access to data raised awareness of the extent of mobile money transfers in Kenya and was able to inform changes in banking policy in that country.
  • Mexico’s pilot project tracking population movements in response to the spread of epidemic disease deepened understanding of those dynamics, informing the need for policy levers that could reduce infection rates.
  • Assessing sentiments of “confusion” in conversations about employment in online forums in Ireland forecasted unemployment increases three months earlier than official statistics.

Awareness, understanding, and forecasting
These categories can give shape to the formulation of questions. If you’re interested in the changing price of wheat in a given country, big data may be used to answer one of the following questions:

  • How much are farmers currently receiving for the wheat they are selling?  (Awareness)
  • What is driving changes in wheat purchase prices? (Understanding)
  • What will wheat purchase prices be next month? (Forecasting)

Combining datasets can reveal insights
A well-articulated series of questions and a purpose help inform the selection of relevant data mediums.  Mediums that provide effective sources of big data include satellite, mobile phones, social media, internet text, internet search queries, financial transactions, among others.

As the examples below illustrate, by cross-referencing primary media with the primary purpose of the big data, big data projects can take on a great variety of configurations depending on the context. Carefully combining datasets from various sources to create “mashups” can reveal further insights.

As we deepen our ability to gather insights from big data and put those insights into action, organizations working in international development can make more efficient use of their resources if they start out by posing the right questions and leveraging relevant data sources.

This post first appeared on The World Bank Open Data Blog. Publication does not imply endorsement of views by the World Economic Forum. 

To keep up with the Agenda subscribe to our weekly newsletter.

Author: Elizabeth Sabet is an Associate at SecondMuse. Oscar Calvo blogs about Latin America and Open Data. Andrea Coppola is a Country Economist at World Bank. Neisan Massarrat is an Associate at SecondMuse.Ryan Siegel is an Associate at SecondMuse. 

Image: A visitor stands in front of QR-codes. REUTERS/Maxim Shemetov 

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Stay up to date:

Data Science

Share:
The Big Picture
Explore and monitor how Data Science is affecting economies, industries and global issues
World Economic Forum logo

Forum Stories newsletter

Bringing you weekly curated insights and analysis on the global issues that matter.

Subscribe today

About us

Engage with us

  • Sign in
  • Partner with us
  • Become a member
  • Sign up for our press releases
  • Subscribe to our newsletters
  • Contact us

Quick links

Language editions

Privacy Policy & Terms of Service

Sitemap

© 2024 World Economic Forum