Flash Storage: Increase The IQ, Release The IOPS


Flash is revolutionising storage. A single commodity SSD (Solid State Drive) is 400 times faster than a hard disk. For context, in comparison, the speed of sound is “only” 250 times faster than walking! However, while flash provides extraordinarily high IOPS (Input/Output Operations Per Second), it brings a whole new set of problems: write amplification, latency spikes, limited write endurance, and – last but not least – very high cost/GB.

Commodity MLC SSDs cost about $1/GB – fifteen to twenty times more than SATA hard disks. This is too expensive to run many mainstream applications on SSD. To leverage the high IOPS but compensate for the high $/GB of flash, flash storage systems are employing techniques such as caching, tiering, and inline compression and dedupe.

Storage vendors and new entrants have attempted to exploit flash in different ways. These products can be grouped into two broad categories, based on their impact on latency:

  • Disk-based products with flash as a cache.
  • Flash-based products (with or without HDDs for expanded capacity).

Disk-based products are fundamentally designed to optimise the use of hard disk drives, with flash bolted-on as a cache to accelerate read performance. Flash as a cache is relatively easy to implement, so it is not surprising that existing legacy storage vendors have taken this path.

Many flash-as-cache implementations are non-persistent and non-redundant, so performance plummets after crashes and/or component failures. Since the “master” copy remains on hard disk, reads benefit from flash, but writes do not. Therefore, overall performance will not scale proportionately with improvements in flash technology.

Because disk-based products rely on disks as a key part of their basic data path, they have difficulty achieving flash-level latency and will be left behind by rapid improvements in flash performance.

In contrast, flash-based products are designed specifically for flash rather than mechanical disk drives, thus delivering dramatically lower latency. The key distinction is that their basic data paths do not require accessing disk. Most flash-based products are flash-only, while some integrate hard disks to expand capacity and simplify management.

Initial flash-only products are basic arrays. Focused on getting the highest possible IOPS, they generally have very high $/GB, and are missing enterprise features such as HA, snapshots, and clones. Even with inline dedupe and compression, flash-only arrays are currently too expensive for running the majority of applications in an enterprise.

Consequently, flash-only arrays require separate low-cost disk-based storage systems for storing snapshots, replicas, infrequently accessed data, and the data of less IO-intensive applications. As a result, flash-only arrays require significant additional work to stage and de-stage data and applications between flash and disk.

Combined with their high $/GB and lack of enterprise features flash-only arrays are much better suited for very high-performance applications without significant data management requirements. Using flash-only products for enterprise applications will require extensive planning, monitoring and additional supporting infrastructure.

Intelligent flash-based products use a combination of flash and hard disk, but apply techniques such as inline deduplication, compression and working set analysis to service nearly all IO from flash. Most data evicted from flash is snapshots, replicas, unused applications, powered-off VMs and other very cold data.

Unlike flash-only products, you can fill 100 percent of the useable flash without worrying about running out of space and having your applications come to a screeching halt. Intelligent flash-based products achieve sub-millisecond flash latencies, and are operationally far simpler and more cost-effective than flash-only products.

Given that many application management problems today originate from storage, flash combined with application-awareness allows intelligent storage systems to not only simplify storage management, but applications and the overall IT infrastructure.

So why hasn’t this been done before? Prior to the advent of flash, mechanical disk-based systems were too complex to support a high level of intelligence. It would be like trying to build a personal computer using vacuum tubes. The huge leap in flash performance, at last puts intelligent storage within reach.

Flash-only architectures deliver performance but still require a disk tier for cold data. Intelligent flash architectures are built around flash, but can use disk as an integral part of the system to provide cost-effective data and performance management.

Although it is easy to think of flash as simply faster storage, it can offer far more. Consider that when transistors replaced vacuum tubes, we got much more than merely compact radios. We got more powerful and more intelligent systems.

Similarly, flash is a technology with potentially profound impact when properly harnessed: intelligent products that are far simpler and far more powerful. It automates many of the tough but tedious problems such as configuration, management, efficiency and performance barriers that waste enormous amounts of system administrator effort.

Simple, smart, fast: This is the future of enterprise storage.

Edward Lee

Edward Lee is an architect at Tintri. Prior to Tintri, Ed was the Principal Systems Architect at Data Domain, and a key contributor to the first and subsequent releases of Data Domain’s file system. He was responsible for innovations like the BOOST deduplication protocol and replication. Prior to Data Domain, Ed was at Zambeel and Compaq Systems Research Center. He has a Ph.D. from UC Berkeley in Computer Science, where he was an original member of the Berkeley RAID team.