REVIEW: Talend Open Studio

Over time the average enterprise is likely to accumulate numerous data sources in a variety of formats—from simple text files and spreadsheets all the way through to large and complex databases. Which is all well and good until a user or application needs to access data from multiple sources, as with data mining and analytics tools for example, or when data needs to be migrated from one format to another or consolidated in some way.

Commercial tools are available to help with such tasks, but these can be hideously expensive, not to mention complex and are often limited in the types of data they can handle. As a result most companies end up building custom programs and integration routines themselves, most often, on a time-consuming and inefficient ad-hoc basis.

Talend Open Studio (TOS) addresses all this by bundling together a set tools designed to handle common data integration tasks on top of which it adds an easy to use graphical interface, plus connectors to understand common data sources and plug-in components to manipulate the data as its being processed.

Furthermore it’s an open source application which can be downloaded and deployed free of charge. Plus it’s very scalable, enabling it to handle the data integration needs of everyone from the smallest of small businesses to large corporate organisations.

Features

Built on the Eclipse development platform, TOS can be run on both Windows and most Linux distros, a Java VM being the only real prerequisite in either case. The host system can be a server or a workstation and for our evaluation we used Windows 7 on a standard desktop PC, installation taking around 20 minutes on this setup, including the download of the zipped install file.

A full set of connectors comes as standard to, for example, handle comma delimited data files, Excel spreadsheets and all the leading SQL formats including Microsoft SQL Server, MySQL, Oracle, Ingres and DB2. AS400 support is also built in along with connectors to integrate with SAP, and other popular applications, including cloud-based services such as SugarCRM and Salesforce.com.

The integrated graphical user interface is responsive and easy to navigate with plenty of wizards and other aids to help navigate and apply the various tools. It is, however, something of a specialist application, written by developers very much with other developers in mind. As such it can take while to get to grips with the terminology employed and the steps required to perform common data management tasks.

On the plus side users familiar with programming tools should have few problems, plus there are some useful videos and tutorials on the Talend Web site and well-written documentation to point new users in the right direction.

Another plus is the ability to get started straight away, by dragging and dropping connectors and components onto the TOS workspace to build data integration jobs that can be run directly from the application. As you get more familiar with the options, however, you’ll want to customise the various connectors and components with the option here to store them as re-usable metadata in the TOS repository along with other job and project information.

Jobs can also be broken down into sub-jobs and the information being handled transformed using a variety of operators to, for example, sort data, combine fields from different sources, deduplicate data, perform conditional loops and so on. Again these are easy to deploy using by selecting from a palette of drag-and-drop components on the TOS workspace, making it very much like using a graphical integrated development (IDE) application, albeit one focused on data integration.

There’s even a built-in debugging tool, that lets you see, in real time, data as it’s being worked on when a job is run. Moreover, you don’t have to run integration jobs through TOS itself as under the hood the application generates either Java or Perl code to do its work (you choose which when you start a new job) and this can be exported to power batch operations.

This ability to generate independent code is a real selling point as alternative integration products tend to rely on an RDBMS or proprietary processing engine to do the work, both of which can be bottlenecks when it comes to large scale deployments. Talend jobs, on the other hand, just need a Java or VM or Perl interpreter to run lending themselves to deployment on server grids and on hosts close to the data sources concerned to further enhance performance. Actual throughput will, of course, depend on the host platform, but our limited tests certainly whizzed through and we were very impressed with how slick it all was.

Other features worth mentioning include tools to insure data consistency and a built-in Business Modeller to allow non-technical staff to design high-level workflows. Admittedly fairly basic, this allows projects to be outlined in the form of annotated flow-charts which the techies can then flesh out using the main Job Designer app. Automated documentation tools are also provided plus, of course, it’s possible to include custom code to handle integration tasks not covered by the built-in connectors and components with additional automation tools included here too.

Summary

A very competent and comprehensive application, Talend Open Studio is worth looking at by anyone with data integration tasks to perform. All the more so given that it’s a freely distributable application in a market traditionally dominated by expensive proprietary alternatives. It is, however, a specialist tool and familiarity with programming tools and data manipulation in general will help with the learning curve, otherwise expect to take a while getting to grips with how it works.

The fact that it’s open source shouldn’t be a concern with active community support plus a variety of commercial evaluation, support and training services which can be purchased by those that want or need them. The company also sells a more scalable implementation, known as the Talend Integration Suite, which adds support for a shared repository and other team-oriented tools plus additional performance, job scheduling, failover and high availability features. Support services are also bundled as standard with this version.

It won’t suit everyone, and Talend Open Studio isn’t the only open source data integration program on the block, but it is one of the best. Plus it’s easier and more effective than both the majority of commercial alternatives and doing it all yourself.


Our latest thought leaders