Compression And Deduplication In Flash Storage

Flash Storage

A number of factors have contributed to the widespread adoption of the all flash storage array. The flash devices themselves became much denser, allowing companies to store data from data warehouse and other analytic systems at a per gigabyte cost that is almost the same as a storage array with HDDs. Additionally, the hyper-convergence and consolidation of workloads onto virtual machines demanded low latency for all storage.

All flash storage arrays offer a number of benefits to the modern data centre. First and most important is the high IOPS at low latency that only flash storage can offer. Next, is a recent innovation: the high levels in density of modern flash devices allow these arrays to operate with nearly the same levels of storage density as an HDD-based array.

Another impact of density is that flash storage devices can operate in a more closely packed environment, offering higher physical density than HDDs. This means with some arrays you can deploy a petabyte or more of storage in a single rack. Given the data centre footprint of traditional arrays (which are typically many racks) this is a significant improvement. By significantly reducing data volumes, these technologies help companies save money by lowering their storage expenditures and protecting existing investments. In addition, decreasing the number of storage arrays required in the data centre also leads to lower energy and cooling costs.

Inline Compression & Deduplication

One of the main benefits of many flash arrays is the use of inline compression and deduplication. This allows more data to be stored on a given a storage device. It works in a multi-phase process of compression and deduplication where the data is first compressed and then deduplicated. By compressing the data first, the process leaves fewer bytes for the more computationally expensive deduplication process.

Compression For Density & Performance

Early computer users may have poor memories of compression, but when the HDD on your PC filled up, you had to compress folders to provide enough space. Suddenly your PC would grind to a halt. With powerful modern CPUs this works the opposite way, while compression does require CPU cycles to compress and decompress data, the reduction in IOPS to gather compressed data can greatly improve overall throughput.

Many customers also connect deduplication with unacceptably poor performance and consider it only suitable for backup use cases. However, this is not always the case as certain vendors have developed methods to avoid this issue, and can even improve performance. Indeed, IDC issued a report in 2012 that put forward the case that deduplication is no longer solely for backup.

Applying data reduction techniques can also offer benefits at the caching layer. A storage array can allow more hot blocks to be stored in its cache and therefore enable performance acceleration.

Since the compression and deduplication take place before the data is written to the underlying storage devices, the amount of writes to storage are also reduced. This can improve the overall read and write performance of storage arrays by more than a 100-fold, and in turn extends the lifecycle of the SSD and offers enhanced business agility and flexibility.

By integrating the functionality at the storage level, the disadvantages of other compression methods such as database compression can be avoided. Many databases offer in-database compression, which while helping I/O performance, increases CPU utilisation on the database server. Also, in most commercial databases the use of compression requires additional licensing costs, and time consuming application changes. With the use of array-based compression these concerns are eliminated, as the compression is transparent to the application and database.

Compression & Deduplication: Making Flash Affordable

Storage cost is typically measured in two metrics—cost per IOPS and cost per GB/TB. In the case of cost per IOPS, there is almost no comparison between flash and HDD. A basic example of this is a PCIe SSD which costs $2,500 and delivers 400,000 IOPS, or less than a penny per IOPS. Even a high performing HDD like a 15k SAS disk, would cost $150 and deliver only about 180 IOPS. This is almost a dollar per IOPS, or over a hundred times more than flash.

When looking at costs and density, the density of HDDs has the cost-per-GB equation in their favour. The PCIe SSD in the previous example costs $2,500 and has a capacity of 1TB, giving it a cost of $2.44/GB. The HDD costs $150 and has a capacity of 600GB giving it a cost-per-GB of $0.25.

What brings these two metrics together in favour of flash is that as workloads need more IOPS (due to the demands of real-time analytics and processing applications), the cost of adding IOPS to a traditional array is very expensive based on the sheer number of HDDs required to service those IOPS. The added functionality of compression, encryption, and snapshots help tilt this equation in favour of the all flash arrays.

Paul Silver

Paul Silver is Vice President, EMEA for Tegile, a Western Digital brand. With over 20 years’ experience in the industry, Paul is a specialist in building and growing successful sales at organisations including EqualLogic (acquired by Dell) and DynamicOps (acquired by VMware). Paul has been responsible for Tegile’s operations across EMEA since 2013.