The dangers of data: why the numbers never tell the full story
Confirmation bias … the available data only gave Donald Trump a 15% chance of winning the presidential election.
Every day, we generate 2.5 quintillion bytes of data. Organized and analysed correctly, this information has the power to create a better future for all. Big data can accurately detect earthquakes, floods and famines, before allocating resources where they are most needed. Open data has the potential to improve access to education, healthcare and financial services.
Despite such promise, data sets are the product of human design. A reflection of the flaws, preferences and experiences of their imperfect creators. Nearly all data is defined, interpreted and manipulated by humans who frequently make a value decision about what to include. Therefore, understanding the elements excluded from the data set is just as important as the data itself.
As humans, we suffer from confirmation bias: the tendency to seek out or interpret information in ways that reinforce our existing views, while ignoring any contradictory evidence. A good example is the 2016 US presidential election when most pollsters failed to predict Donald Trump’s victory. On election day, Trump was given a 15% chance of winning based on the data. But the experts neglected much of the following: inaccuracy of turnout models, overrepresentation of graduates in polls, the impact of hard-to-reach groups and the honesty of voters with complete strangers. Such oversights will never be prevented by having access to more data, but rather by bursting the filter bubbles that trick us into believing that everyone thinks and acts like us.
At the same time, more data has been created in the last two years than the previous 5,000 years of human history. This accelerated flow of new information is increasing our reliance on the “availability heuristic”: the tendency to make decisions based on the most recent information available. Most governments, economists and journalists failed to foresee the 2007-8 financial crisis; since the prevailing mathematical models did not suggest a downturn. If only they had focused less on the latest data and more on the cyclical nature of financial markets.
Equally, there is far more data on certain individuals, groups and countries than others. This has causing a data divide, particularly between developed and developing countries. If I was to rely on the data alone, it would tell me that my family in the Zagros mountains are not real. Simply, because they don’t have access to the internet and have never filled out a survey in their lives. The under-representation of languages, cultures and geographies in data sets creates skewed narratives and distorted accounts of reality.
Making matters worse, the past is often a terrible indication of the future. There’s little merit in using existing systems, which is reliant on known variables, constancy and logic to create information where there may be none. Nassim Taleb, a professor of probability, addresses such uncertainty in his book, The Black Swan. A Black Swan is an outlier outside the realm of expectation that, despite its low probability, fundamentally changes the course of history. Most fascinating of all, a Black Swan is often rationalised with the benefit of hindsight, once there is sufficient data available. We live in a random world full of uncertainty, variance and coincidence; under such circumstances, data can only take us so far.
Data only measures the measurable. What about the things that can’t be measured? Binary values of 0 and 1 can’t capture the complex and often irrational realities of being human. Can we reliably quantify love, kindness or happiness? As the sociologist William Bruce Cameron stated: “Not everything that can be counted counts, and not everything that counts can be counted”. The obsession with metrics has culminated in quantification bias: a reverence for numbers above all else. Even if the data is 100% accurate, it can only tell us what is happening; not why. Without a human layer of understanding, data is void of context and nuance. Numbers only make sense once we have discovered the underlying human emotions, behaviours and motivations.
On paper, data promises to create a more informed, connected and inclusive society. Yet, as previously highlighted, our systems of thought are fragile and susceptible to bias. We tend to think of data as objective truth, independent of human influence. In reality, data is the continuation of our human biases into algorithms. As we enter the Fourth Industrial Revolution, with its proliferation of information and the exponential rise of AI, we have the opportunity to design a more equal society, since technology reinforces human thinking.
Data “for good” requires renewed discourse, fresh thinking and a new social contract to reflect the radical advances in technology. Above all, data needs to reflect and represent the diverse views and needs of all 7.6 billion people who reside on our planet. Ultimately the future of human civilisation could depend on whether access to information is concentrated with a few or open for all. Data is not the answer; it’s merely a tool that empowers humans to discover new questions and create new possibilities.
Don't miss any update on this topic
Create a free account and access your personalized content collection with our latest publications and analyses.
License and Republishing
World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.
The views expressed in this article are those of the author alone and not the World Economic Forum.
Stay up to date:
Data Science
The Agenda Weekly
A weekly update of the most important issues driving the global agenda
You can unsubscribe at any time using the link in our emails. For more details, review our privacy policy.
More on Fourth Industrial RevolutionSee all
Olga Stelmakh-Drescher
November 4, 2024