Open Source: Big Deal For Big Data

Although the term was only coined in 2009, ‘big data’ has quickly become one of the most talked about concepts in IT today. The existence of big data is not in itself new but the demands to extract meaningful information from it are increasingly numerous and complex.

At a time when the volume of data being produced on a daily basis is increasing exponentially, the possibilities opened up by the analysis of this data are endless. Until very recently, only a handful of organisations have been able to use and exploit this data for competitive advantage. However, this is fast changing, and open source is playing an instrumental role in the democratisation of big data.

Open source technology has come a long way in the last decade and is now being widely recognised as a viable alternative to traditional proprietary software. Underpinned by a strong foundation of underlying tools and technologies, open source has quickly emerged as a compelling building block for robust, cost-effective enterprise applications and infrastructure.

Many of the big data systems entering the market have their roots in the open source world, where developers routinely create new approaches to problems that haven’t yet hit the mainstream. Historically, big data was the reserve of big multinational banks and Internet giants that could afford to spend millions of pounds on software and to hire PhDs for data analysis. Thanks to open source, SMEs, that had previously been disadvantaged due to the high costs of processing massive amounts of data, are able to unlock the potential of big data.

At the crux of this open source movement lies Hadoop, which has proven to have the most market traction of all big data technologies. Hadoop has been a game-changer for big data, making the collection and analysis of vast amounts of data possible on low-cost, easily scaled, commodity hardware.

Taken alone, Hadoop remains complex to use, and data scientists must spend a lot of time trying to understand exactly what they are dealing with. Many people understand why big data is important and can see where it can take them from a hypothetical, high level point of view, but companies struggle to find enough people with the advanced technical skills needed to realise their vision.

Skilled resources have not materialised at the same rate as the marketing hype for big data. While awareness of big data is growing, only a few organisations that rely on the management and exploitation of data, such as Facebook and Google, are currently in a position to capitalise on it.

It is thus essential to properly equip analysts with tools that abstract the technical complexity of Hadoop, and make it easier to load and transform big data.

The time has come when organisations that expect to leverage big data not only have to understand the intricacies of foundational technologies like Hadoop, but need the infrastructure to help the make sense of the data and secure it. Without these complementary capabilities, big data will remain an IT privilege and remain out of the reach of business people and lines of business that they represent.

Yves de Montcheuil is Talend's Vice President of Marketing. He joined Talend in 2007, following 15 years of product marketing experience with various US and European software companies, including Sunopsis, Empirix, and SDP/Sybase. Yves holds a masters degree in electrical engineering and computer science from Supelec in France. He has presented at numerous industry events and conferences and has authored several published articles.

  • DataH

    Yves, I agree! Hadoop is complex. I think it is worth mentioning HPCC Systems which provides a single platform that is easy to install, manage and code too. Their built-in analytics libraries for Machine Learning and integrations with open source tools like Pentaho provide you with an end to end solution for ETL, Data Mining and Reporting. More at hpccsystems.com