The Multi-Structured Data Challenge: Choosing The Right Tool For The Job

No SQL

The promise of big data is increasingly forcing organisations to rethink their data management strategies. With growing volumes of multi-structured data flooding into our datacentres, organisations are finding that the relational databases that they have relied on for the last 30 years are now too limiting and inflexible.

In a growing number of cases customers have attempted to build a solution on a relational database only to find that it is not fit for purpose. The Federal Aviation Administration in the US is a case in point: they initially tried to build their Emergency Operations Network on an RDBMS before turning to Enterprise NoSQL database in order to easily exploit unstructured and geospatial information.

Relational databases are great when information is highly structured and consistent, low in volume, entered through forms and retrieved with a well-known set of queries, but if you need to dig deep inside your data to understand context, analyse details and assemble customer reports and views, these technologies quickly demonstrate their shortcomings.

This is because the relational approach to handling information requires that data be formatted to fit into tables. Whether it’s a customer, derivative trade, or legal document, it has to be shoehorned into a model that satisfies the referential integrity of the underlying relational database system. And while this can work, it clearly has limitations.

For many organisations, Enterprise NoSQL (Not Only SQL) is a faster and more efficient way to manage data in a variety of formats. With NoSQL you don’t need rigid schemas or formats to be predefined before loading data. Unstructured data, or any data that does not have a pre-defined data model, including social media, health records, financial documents, journals, videos, and web pages can be loaded into a NoSQL platform as is – providing an alternative to the pre-defined format required by relational database management systems.

Enterprise NoSQL databases can also manage traditional structured data and it scales easily across commodity servers, virtual machines or cloud. And, with some of today’s Enterprise NoSQL offerings, organisations do not have to go without support for ACID transactions, as well as other enterprise qualities such as fine-grained entitlements, point-in-time recovery and high availability.

Here are just three examples of how Enterprise NoSQL is in use today helping to address big data challenges:

Media & Publishing: New Revenue Streams

The media and publishing industry was one of the first to adopt Enterprise NoSQL. Faced with competition from new online publishers, traditional publishers have had to reinvent themselves. They have had to find new ways to bring their content together in one place and make it easily available to potential customers online as well as, most importantly, create new revenue streams.

The challenge is how to bring large amounts of diverse data – including a complex mix of text, video, social media and myriad of graphics formats – to enable users to search and analyse it.

Wiley, a 250-year-old UK academic publisher is using Enterprise NoSQL to consolidate 4 million articles, 14,000 online books and thousands of reference works into an online library. The library attracts 65 million page views each month with 240 million visitors annually and handles 1,600 queries per second, the equivalent of 4 billion queries each month. They gained a 50% growth in usage, and after strategic acquisitions of content libraries, were able to quickly absorb and monetise that new material.

Wiley also used Enterprise NoSQL to launch Custom Select, an application that lets instructors build custom-made course materials in minutes. Instructors can search and select relevant content from the library as well as upload their own material. Since its launch users have created more than 18,000 customer book and e-book projects.

The Royal Society of Chemistry (RSC) is using Enterprise NoSQL to support the UK government’s goal of publishing all the estimated 80% of research data that does not end up in a final thesis or report. Vast tranches of publicly funded research gets lost because only some of this information makes it into research reports.

The RSC has already opened up its vast library of scientific data by moving more than one million images, hundreds of thousands of articles and millions of scientific data files to an Enterprise NoSQL database. Unlike traditional databases, the Enterprise NoSQL database is able to automatically ingest, tag and index each piece of unstructured content – from spreadsheets and research papers through to pictures, social content and videos.

As a result, the RSC is able to unlock the real value of its scientific literature through visualisations, simulations and modelling technologies. An increasingly global audience can now discover content quickly, understand the context around it and then connect the dots between different pieces of research, video, journal articles and videos.

Financial Services: Valuable Information

Investment banks require strong governance policies. Once a trade is made it needs to be processed by the back office and reported to the regulators. Trade data is typically read off of the message bus connecting the trading systems, and persisted into a relational database, which becomes the system of record for post-trade processing and compliance.

A tier-1 bank had trouble developing risk profiles and conducting post-trade reporting because of the disparate heterogeneous data sources in legacy mainframes and relational databases. However, with Enterprise NoSQL, it is now able to bring that data into a single system, helping to save millions of pounds in IT costs and respond faster to regulators.

In another example, a global banking and financial services company wanted to build a new Trade Store and was struggling to manage many multiple data inputs which need to feed into tens of lines of business, all of whom use the data in many different ways. The challenge was to rationalise the data, create a single view of the truth and ensure data consistency.

The company first approached the challenge by using a relational database management technology, which proved impossible. Instead, the company implemented Enterprise NoSQL, which provided a single unified source for the multiple trade types and related data accessed by all the business line down stream systems and various reporting requirements, including regulatory. The new system now handles near real-time ingestion of up to 2 million trade events and reference data per day from upstream production systems, including validity and duplication checks and version management.

Government: Increasing Efficiency

Government agencies love documents. But, as pressure mounts to move more services online, they frequently run into the problem of developing applications timely and efficiently. Government agencies are also wary of getting locked into building an entirely new system or replicating their data again and again for each new application. They also have serious data security needs.

Agencies such as the Federal Aviation Administration (FAA), Centers for Medicare and Medicad Services (CMS), Food and Drug Administration (FDA), and Department of Defense(DoD) in the US, along with the intelligence community, are using Enterprise NoSQL today to help solve these problems.

These are just a few examples of how organisations are using Enterprise NoSQL to tackle data complexity, demonstrating that the best approach to multi-structured data management isn’t relational. Enterprise NoSQL is providing new levels of flexibility and agility without compromising key enterprise features such as ACID transactions, government-grade security, high availability, and disaster recovery. While relational databases and other types of NoSQL databases will continue to serve specific purposes, Enterprise NoSQL will increasingly help solve the most pressing big data challenges organisations face today and tomorrow.

Adrian joined MarkLogic in 2012 as the Vice President for EMEA. Prior to this, Adrian was VP Enterprise for Juniper Networks in Europe. He also worked as VP EMEA and Australasia at Chicago based analytics software company SPSS where he grew the business to represent nearly half of the global revenues prior to their acquisition by IBM. This followed 2 years at Mercator software prior to their acquisition by IBM. Adrian's career began in IT services with 11 years at EDS before time at ATOS Origin and Cambridge Technology Partners.