Combining search and business intelligence


The idea behind combining search with business intelligence is that you want the analytic power of the latter but you also want the ease of use and simplicity of search. The question is: how do you get both? Or, more particularly, what is the best way to combine the two: should you start with BI and add search or start with search and add BI?

The conventional answer from companies like Cognos (IBM) and Business Objects (SAP) is that you should start with BI. But then they would say that wouldn’t they? Endeca, which is well known for its search engine, espouses starting with search. No surprise there. However, no matter what your view on which approach might be best, Endeca Latitude is worth a closer look.

Endeca Latitude, which is the company’s BI/Search product, wasn’t in fact created by simply tacking BI functionality onto the back of a search product (which is how you might think of BI vendors approaching search), it was actually created with BI in mind. I should, however, qualify that statement—the product is not about reporting, for example, it is more about gaining new insights into your business—as we shall see, the product’s architecture is more focused on supporting analytics, as well as search.

The core thing to know about is MDEX, which is the product’s engine. Essentially, this is a hybrid search engine and analytic database that stores data in columns (good for analytics) and also tags data for search purposes. Note that this is universal: every data element gets stored in a column for analytics and is tagged for searching.

I should not need to go into the benefits of using columns for analytic purposes as I have discussed this on multiple previous occasions and it is precisely why so many data warehousing suppliers use this approach. However, where MDEX differs from these is that the columns are indexed for search purposes and, of course, the tagged data is indexed. In other words you can query structured data conventionally and you can search against both structured and unstructured data and you can combine these approaches within a single query.

Another major feature of MDEX is that it does not impose its own schema. What the software does is to take your existing schema or schemas, if you are combining data from multiple sources (which can include data from a warehouse, web-derived data, other non-warehouse data and 3rd party data as well as unstructured data), and then combines those, with all data relationships being created automatically when you load the data into MDEX.

This is achieved by Latitude including its own ETL-like capabilities, which are combined with crawlers that will seek out relevant information in source systems. In order to support this you have to define a “pipeline” that defines the relevant dataflow. Interestingly, this remains live so that Latitude can be updated automatically when there is a change to the source system.

In terms of processing, this is mostly memory resident and Latitude makes extensive use of in-memory and on-disk caching. It also scales. One of the company’s users has indexed 21Tb of data.

At the front-end there is what Endeca calls the Discovery Framework, which is the end-user environment provided for search and analytics. Using it basically consists of assembling and configuring out-of-the-box components for filtering, querying, searching, visualisation and so forth. There is an SDK for developing your own components, should you need them, but unfortunately this will mean getting IT involved as opposed to doing it yourself, but I guess you can’t have everything.

The bottom line is that it is the way that Latitude combines search and BI that is significant. I think that there are few people who would question the combined value of these technologies but the question is how you get the most value of that combination. Endeca seems to have a good job of answering that question and it is certainly worth taking a look at, especially if you want to combine query and search against both structured and unstructured data.

SHARETweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Pin on PinterestDigg thisShare on RedditShare on TumblrShare on StumbleUponEmail this to someone

Philip Howard is Research Director (Data Management) at Bloor Research. Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event management. Philip also tracks spreadsheet management and complex event processing.