Enterprises increasingly rely on array based snapshots as a critical component of their business continuity strategy. Gartner estimates that by 2016, 20% of large enterprises will abandon backup/recovery solutions using a traditional methodology and adopt those that employ only snapshot and replication techniques, up from less than 7% today (“Magic Quadrant for Enterprise Backup/Recovery Software”).
And survey results from the Enterprise Strategy Group show that 55% of customers plan to augment their traditional backup with snapshots or replication or both. But why are organisations turning to array snapshots to ensure business continuity in virtualised environments?
Array-based snapshots can provide recovery point objectives (RPOs) and recovery time objectives (RTOs) that are unmatched by traditional business continuity methods. This is because traditional backup methods require moving the data from one disk to another (or to tape), which is very I/O intensive and unsuited to virtualised environments which are characterised by higher server utilisation and higher performance demands on storage systems.
In contrast, array snapshots can be thought of as a point in time virtual copy of the data – virtual, because the data itself isn’t copied or moved when the snapshot is taken, but the metadata (essentially pointers to the data) is modified appropriately for each snapshot taken. This makes array snapshots ideally suited to virtual environments because the burden of moving data through heavily utilised servers and networks is eliminated.
From an operational point of view, you no longer have to worry about scheduling backup windows to minimise application disruption when using array snapshots because the snapshots happen almost instantaneously.
Perhaps even more importantly, recovering data from snapshots is also much faster than traditional backup methods. Restoration of incremental or differential backups can take hours or days before the application is available to users because you will have to first restore the full backup from the non-native backup format, apply the necessary increments or differentials, all while copying a large amount of data across the network.
With array snapshots, you simply mount – or roll back to – the snapshot and proceed with business as usual, resulting in much faster recovery times. Array snapshots are great for unmatched RPOs and RTOs, eliminate the need to plan for and schedule backup windows, and are in general a great fit for virtualised environments. So what’s the catch? As they say, the devil is in the details, and what follows are 5 key considerations you should understand before using array snapshots for business continuity and disaster recovery.
1. How Snapshot Implementations Impact Performance
The way array based snapshots are implemented can have a huge impact on the performance and scalability implications as part of a business continuity strategy. Traditional ‘copy on write’ (COW) implementations result in an instant snapshot when taken because no data is actually copied. However, when the original data changes, the original data is copied to a reserve area and the metadata is updated to reflect that.
Over time, as more snapshots are taken, the performance gets progressively worse due to this process, so a COW implementation does not have the performance and scalability to be used in a business continuity scheme with long-term retention requirements.
For such requirements, other snapshot implementations are needed. ‘Redirect on write’ (ROW) implementations, for example, redirect writes to original data to free locations on disk. Because the original data is not copied, ROW implementations do not suffer from the same performance issues.
2. Snapshot Space and Media Usage
Because array snapshots consume some space on primary storage, the costs of using snapshots relate directly to how much space is consumed. A large factor that affects snapshot overhead space is the “page” size used for snapshot management. Smaller pages allow for more shared data between snapshots, but also result in more metadata which consumes space.
Conversely, if a very large page size is used, even a small portion of changing blocks results in a new page of mostly duplicate data – in such a scenario the snapshot space overhead can be much larger than the amount of data actually changed. Further, applications and even different versions of the same application can use different block sizes.
In the optimal approach, snapshot “pages” would be exactly equal to the block size of each application, avoiding wasted space and minimising metadata overhead. Arrays that support such customisable block sizes can offer extremely efficient snapshots, avoiding wasted space and more metadata overhead.
The most cost effective snapshot solutions use both data reduction methods as well as cost effective media. Let’s consider data reduction first. Because snapshots don’t duplicate data in the first place, there is no need for de-duplication. You can, however, use compression for higher levels of space savings if the array snapshot implementation allows for that.
3. Virtual Machine Consistency
You should understand whether your array snapshots are virtual machine (VM) consistent. When an array snapshot is triggered, the VM file system could still have some operations that are “in flight”, and those operations need to be handled correctly. To be VM consistent, the VM needs to be quiesced (buffers should be flushed, etc.) before the snapshot is taken.
Without VM consistency, a snapshot is crash consistent, which means that the state of the data on disk can be viewed as though the system had crashed. If your array snapshots have the required integration with the virtualisation platform, then the VM file system operations that are “in flight” during the array snapshot will be handled correctly when the array snapshot is triggered.
4. Application Consistency
In addition to being virtual machine consistent, organisations that run virtualised business critical applications, such as Microsoft Exchange, need to make sure that the snapshots are application consistent – the process needs to be able to handle application changes that are “in flight” so that application data in the snapshot retains integrity. Without application consistency, restoring the application could result in corrupt data, because data that was “in flight” was not handled correctly.
5. Snapshot Replication for Disaster Recovery
Most of what we have discussed until now covers business continuity at a local site, but for disaster recovery scenarios where a local site is completely down, local array snapshots do not suffice. For such scenarios, replication is a must so that you have a known good copy of your data that you can bring up at a secondary site.
You will have to pair your array at the primary site with another array at the secondary site, and use your array’s replication capabilities to move the data to the secondary site. Here’s where space efficient snapshots really help as well – if your array has space efficient snapshots, then you save time and bandwidth because you have less data to replicate. If your array has bandwidth-efficient replication that only replicates changed blocks, and intelligently avoids “re-inflating” data over the WAN, you save even more time and bandwidth.
Using array snapshots for business continuity and disaster recovery can provide huge improvements in RPOs and RTOs over traditional methods, and can provide other benefits such as operational simplicity and minimal application disruption. Understanding the 5 key considerations discussed in this article will help make your use of array snapshots for business continuity and disaster recovery a simplified process for IT managers.