Dual Loading For Teradata

As you may know, Teradata offers an Active-Active solution. What this means is that you can have dual Teradata systems that not only act as back-up for one another in the case of either planned or unplanned downtime but that also concurrently support your query load.

This differs from a conventional Active-Passive environment whereby the second system is only kept for standby purposes. The advantage that the Active-Active arrangement has is that you need less powerful systems for each processor because both are being used to serve user queries, with load balancing across the two systems to ensure optimal performance. Or, of course, you can have more powerful systems but superior performance.

Hitherto there has been something of an issue with such approaches because both Active-Active and Active-Passive configurations require the data to be updated within the data warehouse systems in parallel and you need to ensure synchronisation across the two systems. Traditionally, Teradata has provided its Table Load facility, which is essentially a batch loading capability, but that is only fine where batch updates are suitable. But, increasingly, as the world becomes more and more real-time, they are not sufficient.

So, the traditional approach has been to use replication in order to update systems in real-time. And this is fine when volumes are relatively small but, again, data volumes are growing all the time. This means that there is loading gap: there is high speed, high volume batch loading and there is low volume real-time replication but there is an increasing demand for high volume real-time loading. This is where Informatica has stepped into the breach.

Working in conjunction with Teradata, Informatica has introduced a Dual Load solution for Teradata that offers high speed real-time loading across Teradata Active-Active and Active-Passive configurations with synchronisation across the Teradata systems. Of course, this is available to new customers but the focus will be on existing joint clients that are already using PowerCenter in conjunction with Teradata Dual Active, in which case they will simply need to extend their current Informatica environment through the addition of the Informatica Dual Load Option for Teradata.

Needless to say the Informatica software is highly parallel in its own right. Perhaps strangely, high availability is only an option for the Informatica solution rather than mandatory but, in any case, it is highly recommended. Alternatively, if you want the equivalent of an Active-Active environment from Informatica then you can select their Enterprise Grid option.

Informatica does not see its Dual Load solution as competitive with either Table Load or replication but as filling an increasingly important gap that is not met by either of these technologies. It therefore sees the Dual Load option as complementary to both of these approaches and it expects them to coexist.

Of course, the number of companies that will require this level of performance is relatively small (perhaps a few hundred) but they are typically very large organisations with massive data handling issues. Combine the ability to support such volumes along with the real-time capabilities that are increasingly required and I think Informatica should be on to a winner.

Moreover, this sort of technology has the potential to be used elsewhere. For example, in zero-downtime migrations you require Active-Active or Active-Passive systems where both systems are updated in parallel, and kept synchronised, in exactly the same way that this Dual Load feature will be used in Teradata environments. Of course, this would mean generalising the software but it would be a sensible move going forward: I will be interested to see if this is the direction in which Informatica goes. In the meantime this makes a very sensible start.

Philip Howard is Research Director (Data Management) at Bloor Research. Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event management. Philip also tracks spreadsheet management and complex event processing.