Fourth Industrial Revolution

Can data lakes soak up the flood of information?

The digital universe is growing and fast: Its size is doubling every two years and could reach 44trn gigabytes by 2020—ten times 2013 levels. With machine-to-machine communications expected to grow at a compound annual rate of 83% over the next three years, finding ways to deal with this avalanche of data will be key to the success of the Industrial Internet.

Data lakes could be a solution. These enterprise-wide data management platforms are designed for storing and analysing huge amounts of information from disparate data sources in their raw format. This eliminates the usual “extract-transform-load” tasks required with traditional data warehouses and gives data lakes two important advantages for Industrial Internet applications: low-cost storage and high-speed analytics. “Data lakes are an essential component for storage of the data and we’re seeing the advent of new analytical frameworks to be able to process the data in real time, gain insights and even make decisions without human intervention,” says Peter Schlampp, VP products at Platfora. Once the data reside in the lake, they also become accessible to users across the organisation for data analysis and reporting.

Building data lake architectures in support of the Industry Internet differs significantly from how IT architectures are traditionally built, however. “IT architectures in the past did not face the big-data challenge of unpredictable growth, so IT teams had a good idea of how much hardware to procure for short- and medium-term needs. They also had manageable dataset sizes that could be handled by a single large server. The single-server model was good for maintaining a small data-centre footprint, but bad for scaling and for trying to parallelise work,” says Steve Wooledge, VP for product marketing at MapR Technologies, a leading Hadoop distributor.

By contrast, notes Mr Wooledge, “A data-lake-ready IT infrastructure starts with the recognition that predicting data growth is difficult. To counter that, you need an elastic, horizontally scalable architecture where you can handle more load by incrementally growing your architecture by adding more commodity servers.”

Scaling up data-lake solutions will also require common standards across data-lake providers. Launched in February of this year, the “open-data platform” initiative aims to support this effort by promoting big-data technologies with open-source softwares. Large companies such as GE, IBM and Infosys—as well as up-and-coming ones like Pivotal—have joined the effort.

Needless to say, we are just at the beginning of what could be a big-data revolution for the industrial economy. “Connectivity and health-monitoring solutions for the Industrial Internet are fairly mature,” notes Sai Kumar Devulapalli, director of product marketing at Pivotal. “Analysing the Industrial Internet to solve business problems and provide compelling user experience is the next phase of the evolution.” With efforts to make the solutions more open and accessible, the next phase may come sooner than you think.

This article is published in collaboration with GE Look Ahead. Publication does not imply endorsement of views by the World Economic Forum.

To keep up with the Agenda subscribe to our weekly newsletter.

Author: Daniel D Gutierrez is a writer specialising in big data. 

Image: Power connectors hang from server cages. REUTERS/Mike Segar. 

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Share:
World Economic Forum logo

Forum Stories newsletter

Bringing you weekly curated insights and analysis on the global issues that matter.