How businesses will benefit from joined up data
There are many SME’s who slip under the domain that is Big Data users. It is however possible to segment this landscape into a number of sections defined by the amounts of data they consume and how central it is to their core activities. These would be;
- High data users such as those data centric companies in the service sector providing information, research,whose main products are reports; industries such as marketing, legal, compliance etc.
- Medium data users who are also data dependant. Many of these are in the service sectors and typically maintain large inventories of product. This type of business is labour and knowledge biased and is based on providing services and resources. Sectors such as recruitment, logistics, insurance, travel etc.
- Small data users who primarily use data to support management and core business activities such as; manufacturing, retail, leisure, facilities etc. These companies use data to transfer product from the supply chain to their customers.
Despite breaking business users into segments defined by amount of data usage it is difficult to envisage any business not relying on data for at least a core set of management processes. While we could expect the high data users to be more sophisticated in the tools, and more proficient in the storage and accessing of data; all three segments share the same problems. The systems they normally have comprise of numbers of distinct silos of structured and semi structured data. The very large enterprises will have, or strive to have, integrated solutions but this level of data integration is far from universal even across the larger business terrain. We have evidence that; at the top end of the SME market there are many companies who have little integration between pools of information. Lower down the scale there are only a few companies, with very simple supply and delivery models, who have integrated data covering the main part of their processes.
Why SME’s need to have access to big data technology
Over a number of years we keep finding a single problem that business comes up against. This problem, and others that run off it, are all the result of a common cause. Every company we have looked at, over the years, has had information available to them within their systems, but they could not easily distribute it to their decision makers, they could not bring information together in a timely manner and it was difficult to present and share information in easy to understand formats. As an example, for one customer we gathered data from 5 of their distinct ERP applications. Across the business they ran: Oracle, JDE, SAP, CODA along side a totally different configuration of SAP. More often than not we end up with data output from these systems as manual CSV dumps that are then cleansed manually through a sequence of spreadsheet manipulations. Although we can devise other ways of capturing data, and produce any number of Restful or SOAP based API’s that ingest or receive data streams, it either involves creating complex mapping to convert from one schema to another, or adding a new table to accept data that followes structured outputs. It tends to be a one-off event and once the analysis is completed the customised tables are no longer needed. A lot of programming resource can be spent on these processes.
While these issues were evident some time ago and before Big Data had come of age, and while we are fluent in a number of data warehousing techniques, they all depend on having like structured data. The luxury of data sets with similar structures does not exist within one company let alone across myriad companies. Added to this problem there is also a large amount of semi structured data in the mix as well as data in different formats. When, what appeares to be large amounts of data, is gathered together, to form snapshots; showing core elements and crucial information, it is often from sources that are changing very quickly. The speed, or velocity, that the data comes in and the speed at which it was consumed at, becomes faster; this is another factor that needs to be taken into account.
Looking to resolve the system problems of rapidly growing businesses, that are quickly outgrowing their legacy applications, reminds us that the same problem of fragmented information has existed over the last few decades. Some time ago we encountered a company that had grown quickly through acquisition and aggressive marketing and had commissioned a cross group ERP application. This application took quite a while to design, build and deliver. Aside from the application costing a fortune, the specifiers, the business analysts and the developers all overlooked the fact that much of the group gained revenue from recurring lease sales. The problem was that the system did not have an automated way of getting lease expiry dates out to the sales people out in the field. Those responsible for managing lease renewals were all wandering around with their own spreadsheets. Even in those days it was difficult to accept that a sales led successful business depended on its consultants creating their own spreadsheets, and even stand alone databases, to efficiently deliver their core products. It is more difficult to accept that this condition still exists, and is common today. A common problem with the solutions that the big integrators have provided is that they are not very flexible getting a new data source into or out of the system can be an involved and expensisve process compared to the flexability that Big Data technology can provide.
To us the solutions are obvious, the internet and specifically the web has taught us that searching through data gives information. If it works for data on the internet it can work with data that a business creates, acquires and uses. Big data technology can be applied to private data, as well as a mixture of private and public data because big data technology is designed to handle structured and semi structured data. It is designed to work with data stored in the cloud, it is efficient at dealing with dynamic processing requirements and changing volumes of source material.
We identify a solution because the technology to make it possible has emerged. This technology is; mature, allows flexible scaling and is affordable. Why we need a solution is evident; it meets a need and solves a recurring problem. It will also change the way people and business will work with data. It makes data driven decision tools easy to access, it delivers on-demand business intelligence and it allows access to the knowledge trapped within documents to be accessed without vast in house technical resources. It permits knowledge, collected from the silos of document and data stores to be blended and analysed.
The Business Case for SME’s to look at Big Data solutions
When looking at the general technology needs of any business the most prevalent single problem they all suffer from is caused by having silos of information. All of the major business consultancies, we have worked with, recognise this as a critical problem. Some even use it as a measure of the maturity of the processes that the business has in place. There are many reasons companies suffer from fragmented information. They will normally have a system; to manage their customer relationships (such as Sales Force), they will have accounting software (such as Sage), they will have applications that manage their sales, stock or services (such as an E-commerce application or a point of sale product), and they will have resource management software.
As a company grows, their systems grow independently, as the the amount of information collected grows so it becomes less manageable. As a result of growing data the IT departments will become very stretched. Getting the data from all these systems is problematic as, aside from not talking to each other, all of these systems produce data that is contained in completely different structures. Three characteristics that can be attributed to the data that companies accumulate these are; velocity (the speed at which data is created and consumed) volume (the physical amount of data collected and stored) and variety (the different structure and form the data is held in). High Volume, high Variety and High velocity is a standard description of ‘Big Data’.
While comparatively the amounts of data that an SME works with are much lower than the current definition of ‘big data’: the technologies that handles both the data held by an SME and the technologies used by the ‘Big Data’ players can be shared. The characteristics of high volume, high variety and high velocity apply to the data sets that typically an SME encounters. It is therefore not surprising that the technology which drives the Big Data companies such as Google, Amazon, Face Book and Twitter offers the solutions to the problems facing a typical SME. The ability to work with unstructured and semi structured data, distributed processing, cloud stored, elastic resources, optimal searching, easy classification and indexing, machine learning, visualisations, on demand self serve reporting are the doors that big data technology has opened.
This article is published in collaboration with Smart Data Collective. Publication does not imply endorsement of views by the World Economic Forum.
To keep up with the Agenda subscribe to our weekly newsletter.
Author: Bruce Robbins is CEO of Xcipi.
Image: Wires are seen in this image. REUTERS.
Don't miss any update on this topic
Create a free account and access your personalized content collection with our latest publications and analyses.
License and Republishing
World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.
The views expressed in this article are those of the author alone and not the World Economic Forum.
Stay up to date:
Data Science
Forum Stories newsletter
Bringing you weekly curated insights and analysis on the global issues that matter.