There’s identity resolution and then there’s identity resolution

After the whole hype thing one of the biggest communications problems created by IT vendors is when different suppliers use the same terminology to describe different sets of capabilities. Virtualisation is a case in point, especially with respect to data virtualisation. But that’s a discussion for another day. Today’s case in point is identity resolution.

It’s not really the vendor’s fault in this case: there are, genuinely, two different types of identity resolution.

The first type of identity resolution arises typically in conjunction with data quality. You want to know that P Howard and Philip Howard are the same person or you want to know that Bloor Research is different from The Bloor Group even though Robin Bloor is a shareholder in both companies. In other words, this sort of identity resolution is all about matching records and de-duplicating your data so that you can (in the case of customers) get a single customer view or a 360 degree view (which would also include what products were bought through which channels) of your customers.

The second type of identity resolution is similar but different. The classic example is in police work. Here you want to know that some particular criminal has fifteen different aliases, say. Moreover, under each of those identities he or she will have multiple contacts and you may want to do social network analysis against those contacts to see who else might have criminal tendencies.

Moreover, whereas in data quality environments you would use address data to help match identities, here you are very much interested in the different addresses at which our criminal has lived because it may help to identify those contacts you are suspicious about. Further, those contacts may themselves know other people that you might be interested in, so you are also looking for non-obvious relationships with a measurement of the degree of separation between people.

I only know one vendor that specialises in this second type of identity resolution and that is IBM. Its InfoSphere Identity Insight product is well named, even though it tends to be lumped in with the first category of identity resolution products. The key word is “insight”, Identity Insight is about getting insight into identities: who this person really is, who he knows and what he does rather than just matching records.

That includes, incidentally, identifying that person’s gender (which tends to be assumed in data quality environments—so how do you tell if Lesley is a man or woman and how good are you with foreign first names—and isn’t this likely to be important for marketing campaigns?). Moreover, Identity Insight not only deals with name and address data but also other types of interesting information such as biometrics, IP addresses (for cybersecurity), buildings, building inspectors, benefit offices and so on.

You can also extract data from unstructured sources (something you can’t do with the first type of product discussed) even though this is not a formal part of the product yet (it is planned). Finally, there’s also a built-in CEP (complex event processing) engine so you can report on events relevant to particular people in real-time.

Actually you can think of both types of identity resolution as being complementary. If you are a bank you might build a single customer view (you have to now if you are in the UK) and know all about each client. But should you be doing business with this client? Identity Insight can tell you that.

Both sorts of identity resolution can provide significant payback but this is often difficult to quantify in the case of data quality. Not so in the case of Identity Insight, however: there is an independently audited study of its use at Alameda County Social Services where Identity Insight was deployed alongside a DB2-based data warehouse and Cognos. The ROI was 631%, the initial investment was recouped within two months of going live, and the department is saving $24m pa.

There are lots of good things about both types of identity resolution but they are not the same and Identity Insight tends to get put in the same bucket as the data quality offerings on the market. IBM really needs a clearer way of differentiating itself in this area but it’s not easy: the product was previously (in part) called entity analytics but that doesn’t really ring many bells for people and my thesaurus is not much help either. I am sure IBM would welcome any bright ideas.

SHARETweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Pin on PinterestDigg thisShare on RedditShare on TumblrShare on StumbleUponEmail this to someone

Philip Howard is Research Director (Data Management) at Bloor Research. Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event management. Philip also tracks spreadsheet management and complex event processing.