Taking A Byte-Sized Approach To Big Data

Data Science

Another day, another statistic on how the already astronomical volume of data being generated across the world is escalating as every second passes. According to IBM, 2.5 exabytes – that’s 2.5 billion gigabytes (GB) – of data was generated every day in 2012, with other figures estimating that the last two years have seen a total accumulation of at least a zettabyte of data. Data is growing so fast that we will soon need an entire new vocabulary subset to describe it – today’s yottabyte (equivalent to one trillion terabytes) will be considered small in five years’ time.

But of course the Big Data revolution is not about the quantity of data generated, but the process of using tools to apply meaning to, and extract value from it. Data is only as good as the intelligence it produces, and that requires sophisticated analytics.

The Rise & Stall Of Big Data

Despite the hype, there are still many misconceptions and interpretations of what Big Data means. Similarly, there are very few examples of organisations actually rolling out Big Data projects and even less seeing value from them. In fact only 8% of organizations surveyed by Gartner have actually deployed a “Big Data” project in 2014, with some 57% still claiming to be in the research and planning stages.

Why has the Big Data wave seemingly stalled? There are two reasons. First, given the breadth of unstructured data within organisations now, the scale of the task can appear insurmountable. For Big Data to be successful, data has to be harnessed and cleansed before value can be extracted, and with inconsistent processes, pockets of data and no clear business owner of the initiative, this can seem impossible. For those who have embarked upon Big Data projects using predictive analytics, many have discovered that it is far from straightforward.

This is largely because while predictive analytics have been heralded as the solution to the Big Data problem, solutions tend to be overly complex and often require specialist skills to run them. While a majority of organisations are beginning to understand the opportunity, they still lack the know-how, tools and/or budget to capitalise on it.

The Appliance Of Science

In order to address the shortfalls of predictive analytics, a new generation of Big Data capabilities are emerging. This new generation recognizes Data Science for what it is: a process of proposing and testing falsifiable hypotheses so that opposing ideas can be tested and verified, allowing the value of data, and the software that uses the data, to be taken to the next level.

Instead of simply understanding why something happened, Data Science enables organisations to predict what will happen and offers suggestions on what can be done about it. And where applicable, solutions not only make recommendations to the business, but also pass them along to the system of execution. It’s not simply about making better decisions, but about reducing the workload required to make those decisions.

Crucially, these capabilities also recognise the complexity inherent in analysing Big Data but still make the solutions accessible to users. They achieve this by analysing the data with a rigorous scientific approach, but provide a user experience that explains why a decision is being recommended in terms that can be universally understood. It is critical that the solution be intuitive and accessible – or it will simply go unused.

Data Science groups must also appreciate that the end solution needs to evolve. Not only does the solution need to have a measurable (and reportable) value that can be shared with the business, but it also needs to have internal measures that serve as a feedback loop for self-improvement. Otherwise, even today’s best solution will ultimately find itself stale.

Part Of The Process

For Data Science to be successful, it has to be embedded in the business systems that are used within organisations rather than an add-on project that is complex and resource-intensive. From a user’s perspective, the route to value is seemingly straightforward, with minimal data entry and steps/clicks, but supported by a team of scientists – mathematicians, economists, engineers, computer scientists, statisticians, physicists, chemists, operations researchers – behind the scenes, managing the complex processes of data cleansing, applying algorithms, evaluating different views and delivering value.

Designed with Data Science capabilities as part of their DNA, the next generation of business software won’t just collect, report, and distribute information. It will anticipate problems, respond with solutions, identify opportunities and recommend next steps. Only then will we see the Big Data revolution truly flourish.

Ziad Nejmeldeen for BCW

Ziad Nejmeldeen is Chief Scientist, Dynamic Science Labs at Infor.