Delving Into Dark Data

Dark Data

‘Big data’ has become an industry buzzword amongst today’s business leaders. Corporate spending on infrastructure to capture and store diverse volumes of rapidly-changing data has risen significantly in recent years as organisations have scrambled to collect all of the consumer information they believe will help them stay ahead of the competition. However, it’s becoming clear that collecting data alone isn’t enough.

Indeed, this year’s Gartner’s Hype Cycle Special Report cites big data as moving beyond the peak of inflated expectations – with businesses now beginning to see that data must not only be harvested and stored, but also analysed and mined efficiently for new insights if it is to be of any strategic value. However, with so much data now being collected, how do organisations know what is useful and what isn’t, and how does one make the insights actionable?

Typically, most organisations focus their data analysis efforts on transactional data – the information customers supply when they purchase a product or service – because they perceive it to be the most valuable. This typically includes names, addresses, credit card information etc. However, in the course of collecting transactional data, large amounts of additional customer information are also accumulated as a byproduct. This non-transactional data is commonly referred to as “dark data”, which Gartner defines as ‘information assets that organisations collect, process and store during regular business activities, but generally fail to use for other purposes.’

What Is Dark Data?

Dark data can consist of various insights such as which marketing pieces a specific individual responded to, on which platform they answered a questionnaire, or what they’ve said about an organisation or brand on social media. Dark data can also include customer purchase history, frequency of website visits or geographical spread of customers etc. While it can appear obscure and unhelpful, if approached in the correct way, dark data can reveal all kinds of patterns and insights that would otherwise have been missed. In short, it is information that can really make a difference if interpreted correctly.

One key to unlocking dark data’s secrets lies in the ability to understand the relationships between seemingly unrelated pieces of information. The way that data is stored plays a critical role in this. Traditional relational databases, and indeed even many big data technologies, simply aren’t designed to show relationships and patterns between data records. You may be able to unearth some connections at a very high level, but the results will be extremely slow and lack real definition. It’s the difference between understanding that two people living in one house are married, siblings or flatmates, and then going a step further to predict how those differences might influence their decisions.

However, discovering the business value of dark data is starting to become more straight-forward. NoSQL databases can offer completely new ways of reading traditional data sets. In particular, graph databases naturally lend themselves to the mapping of relationships between data, making it easy for businesses to see connections between the information they have. As a result, businesses can ask questions of the database that will bring life to insights in data that have a real impact on their bottom line.

For example, Gate Gourmet, an airline industry catering provider, was struggling to lower an unusually high 50 per cent attrition rate amongst its one thousand employees at the O’Hare Airport in Chicago. Using dark data already easily accessible in internal systems, such as demographics, salaries and transportation options, the company confirmed its suspicion that the attrition rate was directly related to the distance and transportation options from employees’ homes to the airport.

This realisation enabled them to change the hiring process and reduce attrition by 27 per cent. Gate Gourmet did not need to invest a huge amount of money into collecting data to solve the company’s attrition problem. Rather, they needed to look closer at the data they already had available in a way that enabled them to see patterns and connections between employees that were staying with the company and those that were leaving.

Although many businesses are not yet leveraging their dark data, the example of Gate Gourmet demonstrates what can happen when they do. While companies will, and must, continue to actively collect data, it is essential not to neglect the information already available, free of cost! It is clear that there is a need to be more creative by asking new questions from the same old data to throw up exciting and surprising results.

The key in monetising dark data lies not only in gathering it, but in analysing it to discover hidden patterns, developing hypotheses, and then putting the insights to use. Doing this successfully requires a variety of different technologies, each suited to a particular job. By combining data science and number crunching on large-scale analytic technologies, with the real-time execution of complex algorithms by using a graph database, businesses can bring transformative insights to their operational decisions, and combine the latest technologies with their existing data and systems.

emil_eifrem-120x160

Emil Eifrem is founder of the Neo4j open source graph database project.