The Panama Papers investigation is set to have a massive impact on world finance, prompting a clampdown on tax evasion, fraud, and corruption. But it also raises interesting questions about the way we process big data. The International Consortium of Investigative Journalists (ICIJ) just published a searchable database that reveals the names of more than 360,000 people and companies behind secret offshore structures.
Financial organisations will have to dig into this data and unearth any potential exposure that may result from business dealings with any of those names. And to do that, they’ll need to pull information from a wide variety of data sources. The traditional approach of pulling data from five or six different siloes, linking records together, and centralising it, before analysis can be done, is going to be a slow and painful process prone to failure.
If we change our mindset on attacking this problem we can see a faster, more efficient way to solve it.
Solving Big Data’s Big Problem
How do organisations get at the data they need? A lot of resources have been thrown at this problem across the enterprise, but the return is disappointing. With Gartner predicting that 60% of big data projects through 2017 will fail to go beyond piloting, there’s clearly something wrong with this picture.
The sticking point is the centralisation of data. The conventional wisdom is that you have to bring all of the data together before you can start to analyse it. It may sound right intuitively, but all you’re doing is making a big problem bigger. You’re pulling every file from the cabinet and then sitting down at your desk to filter them, instead of just selecting the one or two files you actually need.
Companies spend around 64% of any Business Intelligence (BI) initiative on identifying and profiling data sources, according to Forrester. If you know what you need to get at, why not go for it directly? By moving your analytics to the edge, and breaking the data into bite-sized, easily-controllable pieces, you can extract results much more quickly and efficiently.
It makes sense to set up individual queries that seek out, and automatically match and compare, data based on your requirements.
Draining The Data Lake
Constructing a monolithic database or data lake is not going to deliver the expected results. Half of all big data lake investments will stagnate or be redirected this year because of a failure to deliver a measurable impact on winning, serving, and retaining customers, according to Forrester.
A shift to build little pieces of the whole locally is the way to combat the diversity and distribution problems that companies are really struggling with. By breaking queries down into individual, independent components and then applying rules that enable you to analyse as close as possible to the source, you can get the results you need much, much faster.
Finding Focus & Agility
For financial institutions dealing with the Panama Papers revelations, the ability to target the right data and link it together is obviously essential. Banks need to match names to find legitimate correlations between disparate internal records, cross-reference those records with shifting external sources, such as the ICIJ database, and stir in changing regulatory requirements. And they need all of this done with little delay.
It doesn’t make sense to pull in all the data and then discard 99% of it. Why not deploy the intelligence, the queries, the processing, and the rules where the data lies and filter it in place, before you begin to aggregate, correlate, and analyse?
The current approach to extracting value from big data is not effective for a great many use cases. The Panama Papers is just one example, but any company trying to realise results from a wide range of distributed and disparate sources stands to benefit from thinking differently about attacking the problem.