Coping With ‘Bad Weather’ Conditions: How To Ensure 24/7 Data Availability

Lightning Strike

Nothing is ever guaranteed. Even the biggest companies with the most robust plans can get caught out occasionally. Last week a Google data centre in Belgium was the subject of a power outage, having been hit by multiple lightning strikes in quick succession. While not every company operates at the scale and scope of a Google data centre, there are lessons to be learned.

Generally speaking, data centres are subject to more stringent protection against natural elements than most structures given the value of information they hold. However, not even IT giants can control the full force of Mother Nature at all times.

Thanks to comprehensive security measures, the damage was limited with 99.999999 percent of stored data restored in a matter of hours. A handful of users, however, had to contend with system failures. French start-up Azendoo for example, which provides collaboration software to enterprises, needed days to manually recover its data.

While Google’s provisions allowed most of its customers to go wholly unaffected, the downtime penalties suffered by a few do shine a light on the process of ensuring constant availability of data and services, bad weather aside.

The demands of being ‘always-on’ have risen exponentially in recent years. The digitisation of enterprise business processes means that even the smallest system fault can have a profound knock-on effect. Enterprises suffer up to $10,163,114 annual loss from downtime and data loss with their legacy backup solutions, according to Veeam’s Data Center Availability Report 2014.

An intelligent availability solution helps minimise the impact of unscheduled interruptions. Ideally, in the event of unexpected downtime, the time interval for data backup (or Recovery Time and Point Objective) should be no longer than 15 minutes. Every data loss and every second of downtime costs money and reputation. But it’s not just about speed. The quality and accessibility of the backup interval is also critical.

For securing mission-critical data which must be on-hand instantly, most enterprises trust onsite storage. Yet as companies like Azendoo found out when Google’s data centre was struck by lightning, this method isn’t foolproof.

This is where the 3-2-1 rule can near-guarantee the availability of data when one route to data or services is disabled. The 3-2-1 rule says that a comprehensive availability strategy consists of at least three copies of data, which are stored on at least two different media, one of which is stored externally. Cloud services like Google’s now play a central role in modern availability plans, and while Google’s security arrangements were impeccable, it took an extremely rare combination of circumstances to show that even more can be done.

No matter how fast a workload can be restored from a backup and made available in an emergency, it must be up-to-date. If an enterprise is to minimise the risk of data loss, there needs to be very short recovery point objectives. This is also true for offsite replication, whereby the transmission of data to a cloud service or external data centre reflects the most recent backup status.

Veeam’s research findings also show that one sixth of all backups are unrecoverable. An automatic test of the recoverability should therefore be part of any availability strategy – a support function that many legacy solutions fail to offer. This is a simple test that should be compulsory for all modern businesses.

Failures caused by lightning strikes are evidence of the extreme risk companies have to deal with. But it also shows that they can reduce their risk of serious data loss and system failures, including follow-up costs dramatically with a thoughtful availability strategy that account for every eventuality. It’s like an insurance policy. You don’t always need it, but you shouldn’t go without one.

Rick Vanover (vExpert, MCITP, VCP) is Senior Product Strategy Manager for Veeam Software based in Columbus, Ohio. Rick is a popular blogger, podcaster and active member of the virtualisation community. Rick’s IT experience includes system administration and IT management; with virtualisation being the central theme of his career recently.