Data governance disappoints

For the second year in a row I am on the judging panel for the best practice data governance award for the International Data Governance and Information Quality Conference in San Diego in June.

As a result, a get to see a lot of high quality, detailed submissions from the various organisations that put themselves forward for this award. In addition, of course, to the various companies that I meet in the normal course of my work.

I have to say that in one respect I am extremely disappointed. It seems to me that what I see and hear is not data governance but data quality governance. The focus is invariably on data quality and data quality monitoring, possibly with master data management included and a sprinkling of business rules.

There is a lot of good stuff about getting buy-in from the business and about organisational issues but there is also a lot that is missing. For example, I have never run across a company that is monitoring adherence to data protection laws as a part of their governance projects or that is monitoring data retention policies.

Now, you could argue that these fall under the aegis of the compliance officer and certainly the compliance officer would want to see that these regulations are being adhered to. But it seems to me that these are very much data-driven issues and should therefore be a part of data governance.

After all, if we think about data privacy then you would normally identify fields that need to be masked, anonymised or redacted using a data profiling tool so this very much falls into the same realm as data governance and it should be relatively easy to monitor whether relevant fields are, in fact, being suitably protected.

The problem, I think, is in part due to the fact that most regulations tend to be process-centric rather than data-centric. For example, SOX is about monitoring the processes that data goes through rather than the accuracy of the data. The same applies to PCI-DSS.

As a result, it is usually log management tools that are used to provide compliance monitoring. However, the situation is changing: Solvency II requires that information be “accurate, complete and appropriate”. The forthcoming MiFID II mandates the same requirement. Moreover, both regulations are not one-offs but require this on an on-going basis.

While they don’t actually say so they effectively require data governance. It wouldn’t at all surprise me if we start to see further regulations with the same focus. SOX II is not beyond the realms of possibility. So, we are starting to see more and more compliance requirements that are focused on the data rather than, or in addition to, data processes, so data governance really needs to take this on board.

Going beyond this, data governance as it is currently used is not only specifically about data quality it is also specifically about relational data. I have never met a company that has a single dashboard that combines data quality monitoring with spreadsheet data quality monitoring, for example, nor one that includes data privacy information against both relational data and content.

In other words, data governance is too limited in this respect. And it’s going to get worse with the increasing use of NoSQL and NOSQL databases, not least because there aren’t any data profiling tools that work with non-relational databases to do the relevant monitoring.

I think the fundamental problem of data governance being too narrow is down to vendors. The market is dominated by data quality and MDM suppliers that have limited vision and product sets and which are not going beyond their own agenda.

There are, of course, honourable exceptions. One is IBM, which has been pushing its Information Governance message for some time, which significantly broadens the things to be considered though it notably does not mention spreadsheets, probably because it has no ability to support governance for these.

I would also mention Kalido, because its Data Governor is designed to support the definition and monitoring of governance policies, regardless of what these policies are applied to. Finally, Ataccama, although it is not expanding the boundaries of governance per se, has at least introduced a tool for tracking data quality remediation, which goes beyond purely the nuts and bolts of standard data quality products.

It is time, in my view, that vendors, consultants, system integrators and industry analysts started to extend their vision of what data governance should be – it should not simply be about data quality for relational data – if it is to live up to its name it needs to expand its horizons to include all sorts of data and to go beyond quality, however important that may be, to take on the full implications of “governance”.

Philip Howard is Research Director (Data Management) at Bloor Research. Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event management. Philip also tracks spreadsheet management and complex event processing.