Big Data: It’s Not About Size, It’s About Value

Ever since 1951 when UK firm J. Lyons & Company introduced LEO, the world’s first business computer, to calculate weekly margins on bread, cakes and pies, businesses have relied on transactional data to make critical business decisions. Beyond that core data however, is a new potential treasure trove of unstructured or semi-structured data including weblogs, social media, sensor data and images that can be mined for useful information.

The opportunity that this new pool of data presents has been quantified by the Centre for Economics and Business Research (Cebr), which predicts that Big Data will add £216 billion to the UK economy and create 58,000 new jobs by 2017.

Big Data though, is so much more than simply a matter of size. It’s an opportunity to find insights in new and emerging types of data and content, to make businesses more agile and to answer questions that were previously considered beyond reach. If you’re a retailer, big data delivers the opportunity to turn 12 terabytes worth of Tweets created each day into improved product sentiment analysis or for a telecommunications company to analyse 500 million daily call detail records in near real-time to predict customer churn faster.

However, while the promise of big data presents a huge opportunity for both individual businesses and the recovering UK economy, there are a number of hurdles to overcome before these benefits can be realised. The so called three V’s of big data – volume, velocity and variety – are prompting businesses to re-think their existing architectures and make new investments in modern IT infrastructure to help offer competitive advantage.

Yet it is the often overlooked fourth “V” – value or more precisely business value – that is perhaps most critical. In order to gain value from big data, organisations must be able to effectively integrate, optimise, and migrate this relentlessly-increasing data, as well as properly store and analyse it.

Not unlike kissing in the schoolyard, delivering value from big data is a topic that everyone is talking about but few are actually doing (even fewer are doing it well). The cost alone associated with storing big data is a challenge that many organisations are struggling with mightily.

Too often, the data warehouse is treated as a dumping ground for all data and files that the business accumulates. This is a major reason why chief information officers (CIOs) are spending more of their IT budgets on additional hardware to meet storage requirements, as current systems are simply unable to cope with the gigabytes and terabytes, of data being pumped into data warehouses.

To ensure data (regardless of size) is captured and stored in a quick and cost-efficient fashion, data integration or extract, transform and load (ETL) processes are needed. However, the truth is that most conventional data integration tools can no longer cope, as Big Data is exposing the shortcomings of legacy ETL tools. For example, two thirds of businesses now say that their data integration tools are impeding their organisation’s ability to achieve strategic business objectives.

A select group of organisations are taking the lead and showing the way. A good example is comScore, a global leader in measuring the digital world. comScore started off on a homegrown grid processing stack before adding Syncsort’s high-performance data integration software to sort and aggregate data before it is loaded into the data warehouse.

Where 100 bytes of unsorted data can be compressed to thirty or forty bytes, comScore has stated that 100 bytes of sorted data can typically be crunched down to twelve bytes using the software. When this is applied to billions of rows of new data collected daily, it dramatically reduces the volume of data that is stored. As a result, comScore has literally saved on hundreds of terabytes of storage, but more importantly it is delivering fresher data to its analysts, which helps the company bring new products to market and win new clients.

In some industries, the need to manage dig data goes beyond the opportunity to generate commercial benefits. For example, in financial services one of the biggest issues facing IT departments is helping their organisation comply with new regulations such as Basel III and Solvency II.

These regulations require banks to change how they compute liquidity, meaning that financial institutions have to integrate all relevant data sources and develop a new approach to data analysis and modelling. Basel III demands more transparency and real-time insight, meaning financial institutions must capture, analyse and report on more information than they have done before, often in chunks of a terabyte or more.

Complying with regulations can be a painful, yet necessary inconvenience in the short term. However, organisations who take this opportunity to upgrade their data integration environment will reap the benefits when they are faced with the challenge of accommodating growing data volumes in the future.

Whether for reasons of compliance and regulation, or identification of new revenue streams, big data technologies facilitate a credible business case for turning low value data into trusted information sources of new, highly-valuable insights. However, many companies risk missing out on the opportunity big data provides because their IT departments are so overburdened to meet existing service level agreements and maintain the status quo.

Enlightened CIOs recognise that there is a better way and that efficient, scalable technologies are at their disposal today to help reduce the load on bursting data warehouses to deliver powerful, game-changing new insights to the business and derive value from Big Data.

SHARETweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Pin on PinterestDigg thisShare on RedditShare on TumblrShare on StumbleUponEmail this to someone

Steven Totman is the EMEA Integration Business Unit Executive for Syncsort. Totman has been working in the data integration space for over 15 years and prior to Syncsort he was a key driver in the vision and creation of the IBM Information Server working with over 500 + customer worldwide. His areas of specialty include Data Governance, Data Integration, Metadata, ETL, SOA and Connectivity. He was the DataStage worldwide product manager from release 3.5 to 7.5 and was the co-creator of IBM’s Business Glossary and ran the Metadata product management team that created IBM’s metadata workbench. Totman holds several patents in data integration and metadata related designs. Most recently he was a senior architect in IBM’s Information Agenda Team focusing on Central and Eastern Europe, Middle East and Africa. Totman is based in the United Kingdom but supports customers worldwide.

  • Great piece Steven. Neat to see the industry finally adopting the “3V”s of big data over 11 years after Gartner first published them. For future reference, and a copy of the original article I wrote in 2001, see: –Doug Laney, VP Research, Gartner, @doug_laney

    • Steven Totman

      It’s pretty special when the “father” of the Big Data term comments on a thread – talk about visionary. BTW I also love the whole ‘Dark Data’ definition you came up with scarily appropriate.

  • Stefan Bernbo

    Good argument, I agree that scalable technologies can provide a better way to store Big Data. We have found that by using a decentralized and symmetric architecture, all storage nodes share the same functionality and responsibilities. The symmetric architecture also enables easy scalability, simply by adding more storage nodes. Let me know what you think: