Anybody who is involved in running high availability services in a virtualised world ought to read through this very informative article.
It discusses the various techniques used to prioritise the restarting of services when failures occur on a particular host.
Firstly, there is confusion over whether the article is about true Disaster Recovery (DR) – by that I mean a geographically placed standby site should your primary datacentre suddenly be filled up with water. I can only conclude that the article is really referencing a local site as it talks about software lockstepping, partnerships with third parties etc – all these things are only really suitable for a geographically close site because they rely on very low latencies between the various systems.
So at this point let me introduce another idea:
“To avoid having to worry about any restart scenarios, deploy your critical hosts on fault tolerant technology …”