How To Mitigate Risk Of Power Outages In The Cloud

Power Outage

Power outages such as those experienced by the likes of Amazon, Google and Microsoft in recent months are not only hugely embarrassing for the cloud giants, but also incredibly disruptive for the affected users.

Take for example, Amazon Web Services’ latency issues in its Virginia data centre in April. A number of sites that use Amazon’s cloud to run their web services, including Hoot-Suite, Foursquare and Quora, were all knocked offline for several days, which was obviously hugely costly for all involved.

And with the cloud giants’ outage nightmares laid bare for all to see, the question must be asked, if things always break at scale, how can the risk be mitigated?

High profile cloud-based outages are enough to put any customer off from putting their confidence in a cloud computing supplier. After all, what company could survive without access to business-critical data for over 24 hours?

Not one cloud provider can guarantee 100% uptime, so before putting your trust in a third party and signing on the dotted line of the contract, ask yourself the question, has your supplier mitigated the risk of outage to the best of their ability and considered the five factors below?

Bigger isn’t necessarily best

Regardless of what anyone says, uptime cannot be guaranteed 100% of the time. There are always going to be complications, and the bigger something gets, the more difficult it becomes to understand, manage, document, test and audit. A different approach to take is to build a series of smaller, pocket-sized data centres – 200 to 300 servers is optimum – within one larger data centre.

Building on a smaller scale provides the supplier with greater control and visibility of everything that is going on in the estate. Technical teams can easily be queried as to whether they have carried out a disaster recovery (DR) rehearsal recently, auditing procedures can be assessed, and each fibre cable, network line and rack can be tested individually.

Building that to scale with tens or even hundreds of thousands of servers, as with Amazon or Google, and it becomes clear how easy it would be for certain testing procedures to be missed or be impractical and thus for mistakes to be made. This is not to say that the cloud giants are using the incorrect operational model. The simple fact is however, that the bigger you make something, the bigger the opportunity for flaws becomes.

Multiple power substations and network cables

Power outages will always occur from time to time, but when a cloud provider is responsible for hosting other businesses’ data, they cannot take the risk of hosting all that data on one single estate. Tier III and Tier IV data centres are required to have source power from different substations, reducing the risk of total power outage if one supply cuts out.

Investing in multiple network lines and dedicated fibre optic cables between sites, which can take a number of different routes, will further increase the chance of solid business continuity. Yes, it becomes more expensive for the supplier to lay down multiple networks of cables where one would ordinarily suffice, but any one of the lines can be cut and traffic can still pass through without a hitch.

National grid outages

However unlikely, it is possible for the national grid within a geographical area to collapse, such as in the event of a natural disaster, which could cause an entire region to suffer a massive power outage. So has your cloud supplier considered what they might do if this was to happen?

Uninterrupted Power Supplies will kick in if power to the data centre cuts out, but these will only provide power for a matter of minutes. If the power is suspected to be out for a considerable length of time, generators will kick in, but has your cloud provider guaranteed with their fuel supplier that fuel will be supplied until the situation has been resolved?

Flexibility

Hybrid cloud concepts can be built which allow customers to place any data which is not of a particularly sensitive nature or critical to business operations in a public cloud, whilst all sensitive data is held in a private model. For any non-critical data held in a public cloud, even if a power outage did result in access being lost for a couple of hours, this would not necessarily be threatening to the business’ survival.

For the private element, the customer can be shown exactly where their sensitive data is to be held, to the extent of being given a tour of the premises of the data centre, so that they can be confident that their supplier is hosting their data where they want it to be hosted.

The customer is also given the chance to specify every last detail of the private cloud element to their solution: they can pick and choose what type of DR capabilities are put in place, what level of security is provided, boiling right down to which firewall is used, which make and model of hardware is installed, and which network management tools are put in place.

Communication is key

As with any business arrangement, it is important that your supplier is completely honest and upfront with you, and always makes this point very clear. This way, trust can be built between the client and supplier, and the client won’t be left frustrated when they lose access to their data – data that was hosted by a supplier who claimed that data would be available 100% of the time.

Conclusion

Power outages will undoubtedly happen from time to time, and cannot always be readily avoided. The risks to customers’ data, however, can at least be mitigated and it is important that cloud suppliers have done everything in their power to mitigate these risks to the best of their abilities.

Regardless of whether a customer chooses to go with a public cloud supplier such as Amazon, Google or Microsoft, or a private cloud supplier who may wish to operate on a smaller scale, organisations must be reminded that both options provide a far higher chance of uptime than an on-premise alternative. If a business was to opt for an in-house IT system, whereby all the power came from one source, operating on one single site, who knows what the potential downtime might be?

SHARETweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Pin on PinterestDigg thisShare on RedditShare on TumblrShare on StumbleUponEmail this to someone

Keith Bates has been Chairman of the Cloud Computing Centre since its launch at the beginning of 2010. Formerly Chairman of ESG Group, he has been at the forefront of the technology industry for over 25 years. Keith is driving The Cloud Computing Centre’s financial and technological future and is responsible for the development of its long term strategy and planning. Prior to ESG Group, Keith co-founded Concept Integrated Systems, the company that developed the Concept Agency Management system which is today widely recognised as the leading advertising and media agency management system deployed by more than 140 of the top tier communications companies. Prior to this Keith held a number of senior management positions in software and hardware companies including Unisys and NCR.