RainStor Partners With Teradata

RainStor and Teradata have announced that they will be providing a retention appliance based on Teradata hardware and using RainStor software. Teradata is, of course, well known but for those of you who haven’t read my previous blogs about RainStor (which was previously Clearpace Software) here’s a recap.

The company was founded by ex-MOD staff in the UK who had been working on a way to effectively store data derived from battlefield simulations. As you may imagine this means very large volumes of data, which need to be ingested rapidly, stored for a long time and easily retrieved. The company built a file store (technically, a data repository that stores data in files) for the long-term retention of structured data, such as log data, SMS text messages, call detail records (CDRs), relational data, and so on.

RainStor does not use a database but a file system. This means that it is very easy to install and implement and it requires virtually no administration. Technically speaking, RainStor uses a form of tokenisation with a linked list to enable data value and pattern de-duplication. I should emphasise this point: there is absolutely no duplication of data.

This significantly reduces the amount of data that needs to be stored and then there is compression on top of that, meaning that total space savings are typically around 50 times, though that increases to approaching 100 when considering things like CDRs and the company claims (not publicly) to have found environments where individual tables can be compressed by over 3,000 times.

I mention this because the company is relatively modest in its compression claims—when it says up to 50 times it probably means more like an average of 50 times—whereas certain other vendors (mentioning no names but it begins with O) claim up to 100 times, which is probably comparable to the 3,000 just mentioned.

There are a couple of other things that are important to note. The first is that if you are using RainStor for relational data then RainStor ingests the schema as well as the data. It then supports schema evolution, so that you can make queries at a point in time (that is, you can look at the data exactly as it would have appeared at a particular point in time) and, secondly, it includes a query engine that supports (translates) incoming SQL so that you can run conventional business intelligence queries and analytics against data stored by RainStor.

Now add the Teradata hardware: an appliance for data retention purposes that can back-end onto a Teradata warehouse (with queries transparently accessing the appliance when necessary) either because you want to store long-term data for query purposes, which you don’t access often, or for compliance reasons because you have to store that data (and, incidentally, RainStor provides policy management to automatically delete any retained data the day after you are no longer required to keep it).

OK, this isn’t the first offering for this sort of environment, but it is a big and growing requirement and this appliance should be a lot less expensive (because of the compression and because of the minimal administration) than other such solutions. However, what is most interesting is that this is not just a new market for Teradata, so it can add on this appliance to its existing customer environments, but that it can potentially address retention issues in non-Teradata environments.

If this is as good as it looks then Teradata/RainStor will potentially go into Oracle, IBM and other shops and sell this appliance to back end to those vendor’s warehouses. And, of course, once it has that foot in the door then it will have plenty of opportunity to try to sell its data marts and warehouses. So, this could prove disruptive. Of course, the product isn’t available yet and it isn’t priced yet but if Teradata and RainStor get those things right then this could prove a serious threat to other vendors.

Philip Howard is Research Director (Data Management) at Bloor Research. Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event management. Philip also tracks spreadsheet management and complex event processing.