How Businesses Can Adopt A Google-Style Approach To Understand Big Data

Google Data

Dr. Roy Marsten wrote in March that Graph Theory was a key approach in understanding and making the most of big data. He notes that Google started the graph analysis trend in the modern era using links between documents on the Web to understand their semantic context. As a result, it produced a Web search engine that massively outperformed its established competitors and saw it jump so far ahead that “to Google” became a verb. Of course we know very well Google’s history since then: its graph-centric approach has seen it deliver innovation at scale and dominate not only in its core search market, but also across the information management space.

Graphs For Everyone

But graphs aren’t just for the likes of Google with virtually limitless funds and armies of Ph.D.s at their disposal. While Google and its competitors might be content to build their own graph data infrastructure, that technology is also available off the shelf to the rest of us.

Graph databases have risen to prominence recently, and as 451 Research analyst Matt Aslett recently observed, are moving out of the general NOSQL umbrella into a category in their own right. They have become popular because like Dr. Marsten, many thousands of other software and data professionals have seen that graphs are the best way of storing and querying their increasingly complex interconnected data.

Graphs Everywhere

Web search isn’t the only domain where graphs provide competitive advantage. Facebook and Twitter have used the social graph to dominate their markets, and Facebook and Google are now using their Graph Search and Knowledge Graph respectively to gear up for the next wave of hyper-accurate and hyper-personal recommendations, but graphs are becoming very widely deployed in a host of other industries.

One concrete example of graph databases being used outside of search is eBay, who (owing to a recent acquisition of Shutl) provides a service that uses graphs to compute fast, localised door-to-door delivery of goods between buyers and sellers, scaling their business to include the supply chain. Incidentally, eBay observed that before turning to graphs the latency of their longest query was higher than their shortest physical delivery, both around 15 minutes – something that can’t now be replicated when an average query is powered by a graph database and takes 1/50th of a second!

And it doesn’t stop there. Organisations large and small are adopting and winning with graphs in retail, finance, telecoms, IT, gaming, real estate, healthcare, science, and dozens more areas. It’s an existential proof that Dr. Marsten’s hunch about the power of graphs is absolutely true.

The Power Of The Web Inside Your Application

The power of a graph database is exactly like having a mini-web inside your application. You crawl that “web” of nodes via named, directed relationships until you find your goal, be it the location of your keys, your long-lost university friend, evidence about the efficacy of a clinical trial, or access permissions for computer systems (all graph problems, by the way). The graph database’s role is to store that data safely, and to make querying it fast and easy (since fast or easy alone are not enough).

For example, finding missing keys using a graph database is both fast and easy. We write a Cypher query that visually describes the graph structure we’re looking for (a pattern) and let the database find matches for that pattern in amongst the network of data it holds.

In this case, we’re asking to find matches in the database for a person called Alice who owns some keys. But what we’re particularly interested in is the location of those keys which we ultimately return. In sketching the graph structure (in ASCII-art!) we close the cognitive gap between the user and the graph so that querying even densely interconnected business data is practically child’s play.

I strongly agree with Dr. Marsten that creating and analysing graphs will bring us to answers, and when we let data connect itself meaning will emerge. I also believe that our ability to understand graphs is greatly enhanced with the right tools, and I’m very excited about where graph technology is heading. You should be too.

emil_eifrem-120x160

Emil Eifrem is founder of the Neo4j open source graph database project.