Syncsort is fast – really fast!


Syncsort has always had a reputation for high performance but historically it paid no attention to marketing and not much to sales so it was only hearsay that suggested it was a product worth looking at.

That has changed recently with the company coming under new management, so I’ve been taking a look under the covers to see what is different about DMExpress, which is the company’s data integration product, and how come it holds the ETL record (by a big margin) for loading data into a warehouse.

There are a number of important points. The first is that the technology was originally developed to run on mainframes back in the 1970s. At that time you needed to be parsimonious with your use of resources and you had to squeeze every bit of performance out of whatever little memory you could access.

That frugality has been carried through into today’s product (which runs on all leading platforms). For example, by default, DMExpress uses 15% of available memory on whatever platform it is running. Other tools in this space typically take all available memory. Similarly, the product directly connects to the disk drives on the source and target rather than going through the operating system, thus cutting out any overhead associated with that process.

Secondly, and the biggest thing that makes Syncsort unique in the data integration space, is that the product is built around an optimiser in much the same way that databases have an optimiser. Of course, this only makes sense if you have lots of different ways of achieving the same results.

Most ETL and data integration platforms don’t have more than a few different algorithms for performing joins and sorting, for example, so it is arguable that they wouldn’t get much better performance if they did have an optimiser, because their choices are so limited. Syncsort, on the other hand, has some 30 different sort algorithms and a similarly large number of join and other algorithms. The optimiser then creates a transformation plan in the same way that a database optimiser creates a query plan.

Moreover, this optimiser is dynamic so that it monitors data movement as it is happening and, if it finds that the current algorithms being used are not optimal, then it can dynamically change the transformation plan.

I could go on but suffice it to say that DMExpress is extremely efficient and, for bulk loading at least, probably the fastest product available on the market. But efficiency, (which means better use of resources and therefore reduced costs) and performance aren’t everything, even if they are a lot.

If we go back to the discussion about the optimiser, the other big advantage with having an optimiser (and the algorithms to support it) is that the engine is effectively self-tuning. This means that you do not constantly have to tune your ETL processes, which in turn means that developers need to spend less time on maintaining existing processes and can spend more time on servicing business requests for more functionality and capability.

Finally, I should say something about Syncsort’s positioning. You might think that it is a direct competitor to Informatica or IBM and in some cases that may be true. However, it can also be treated as complementary to those products.

Many companies have significant investments in IBM or Informatica as a platform, using them for B2B purposes for example, or employing their data quality and profiling tools. Syncsort is not in those markets but it does have the necessary metadata support to allow it to act as a data movement engine in conjunction with those environments whereby you can continue to design your transformations within Informatica, say, but then use DMExpress to actually move the data. To support this Syncsort is positioning DMExpress as a Data Integration Accelerator. It certainly is.

SHARETweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Pin on PinterestDigg thisShare on RedditShare on TumblrShare on StumbleUponEmail this to someone

Philip Howard is Research Director (Data Management) at Bloor Research. Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event management. Philip also tracks spreadsheet management and complex event processing.

  • Every business or company must lessen or tighten their costly expenditures to make room for larger net profit but if the system requires it then you must consider acquiring it. Like the cost of an SAP business one, I am sure it is costly but the system requires it and it will the business reach and compete with the challenging market and upgrade the system it has now.