Expressor Introduces Semantic Types

Expressor Software has just announced version 3.0 of its eponymous product. There are three notable things about it. The first is that the company is introducing expressor Studio.

This is a new interface with Office 2010 lookalike ribbons designed to be used by business analysts and data stewards who do not want, or need, to know about expressor datascript (though you can go into that level of detail if you want to) as well as developers. This is the way that the whole market is moving, so this represents a timely introduction.

The second notable announcement is that that Studio will be downloadable, for free, as a community edition of the general expressor product. It will come with a run-time engine so it will provide a complete ETL/data integration environment, albeit supporting processing only in a serial manner rather than a fully parallelised product, which is one of expressor’s strengths. This too is a well-timed facility and should help to drive uptake of the product.

Thirdly, and the point I am most interested in, is the introduction of what the company calls semantic types. Readers may recall (or not) that expressor is semantically based. That is, it knows that “customerid” in database A is the same thing as “custno” in database B. That’s a big advantage because it helps in the automation of mappings from one to the other or when you need to join data from these two sources.

Semantic types go a step further. Basically the idea is that you define a semantic type for, say, a customer that uses specified datatypes for its various attributes. Thus, the customer ID might be a 15 digit alphanumeric string. When you define an ETL or data integration process, incoming data about customers is mapped to the semantic type and then from the semantic type to the target.

During this process the software will automatically (based on pre-defined rules) transform the source data to match the attribute specification of the semantic type. So, if one source has customer IDs defined as an integer field then the software will automatically convert that into the required semantic type attribute (in this case, an alphanumeric string). The same process works for mapping the semantically types data to the target.

There are two big advantages to using this approach. First, it makes the definition of data integration processes much simpler. For example, usually if you want to join data from two sources you have to define the sources, define the transforms and then define the join. Using semantic types you cut out the intermediate process of having to define transforms and thus development becomes much simpler.

Secondly, this should make reuse much easier. Of course, reuse is the holy grail of all data integration suppliers (and, indeed, development in lots of other environments also) but it is never achieved to the extent that vendors might like. Nevertheless, this is a significant step in the right direction.

There is one further point about semantic types that ties back to the introduction of a community edition of the expressor product. Along with that free download expressor is also introducing a community portal. I anticipate that members of the expressor community, encouraged by the company, will load semantic types that they have defined into the portal so that these can be shared by other users. That will speed adoption and encourage further development, which must be good for all expressor customers.

The bottom line is that there is some serious meat in this release. I have always liked the company’s approach since it was first introduced, and this release marks a major step forward.

Philip Howard is Research Director (Data Management) at Bloor Research. Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event management. Philip also tracks spreadsheet management and complex event processing.