On February 28th & 29th 2012, if you were peering into a crystal ball I am sure you would have seen organizations that use Windows Azure standing alongside their CIO, CTO, and their staff of IT Professionals “Laying Hands” on a network connection while saying a prayer that went something like this:
“Heavenly Father, we humbly beseech your divine intervention, from your cloud to ours. Lay your healing hands on our application and restore our cloud….. “
“Let us pray: Our Application, who resides in Azure, offline be thy shame. Online become, or my job be done, in the cloud as it is on-premise. Give us our compute, and the cloud reboot, and forgive us our complacency as we forgive Microsoft for 24 hours latency. And lead us not into leap year calculations, but deliver us from application unavailability. For the cloud must be online, all of the time. With application availability there, Microsoft and Azure will share, the power and glory for ever and ever Amen. “
I am sure there were a lot of CIOs sweating bullets during this outage. After all, when the CEO comes into your office and wants to know why your systems are offline, blaming Microsoft, Google, or Amazon isn’t an option. To be fair, Microsoft is not the first Cloud provider to suffer an outage. While this definitely should not happen, the reality is that outages & glitches do happen. This does not nullify advantages of The Cloud. The crux of this issue was a control program that had not taken into account leap year. Bottom Line: Human Error.
I am sure in the coming days the cynics will dog pile on Microsoft and bash Windows Azure. However at the core, our application and our infrastructure whether it be on-premise or in the cloud is ultimately our responsibility. Simply moving to the cloud does not abdicate our responsibility. When discussing the biggest benefits of the cloud, at it's core are elastic scalability, no commitment, and paying only for what you use. While reliability and higher availability are benefits, you are ultimately responsible for your cloud strategy and architecture as an organization.
Applications need to be designed to run in the cloud. If you will be running a critical application at a cloud provider, you need to plan for an outage, design for redundancy, and expect to pay for that redundancy. I have spoken with organizations that actually believe that because they put their application on a single instance of Windows Azure they have high availability. Let’s be real here. The Microsoft SLA guarantee requires a minimum of two Azure Instances. To quote the Windows Azure Web Site: “...when you deploy two or more role instances in different fault and upgrade domains your Internet facing roles will have external connectivity at least 99.95% of the time.” That being said, if you want high availability and backup reliability you need to implement dual cloud instances across multiple geographic locations.
Those that do not learn from history are doomed to repeat it! As IT Professionals, we need to prepare our CEOs so we are not put on the hot seat when the evitable failure happens. We also need to make sure we have a plan on how to communicate that internally and externally to our organization and what our strategy and plan of action is when failure occurs.
Biggest loessons taken from the Microsoft's Cloud outage:
*Note: No beheadings or killings were perpetrated as a result of this religious analogy.
Find out why attending the Microsoft Worldwide Partner Conference always yields ROI.
Is social marketing is relevant today? Let's dispell this idea one little social connection story at a time.
You can now run the bash command line terminal right from within Windows. This announcement is a game changer.
What does a smooth deployment to the cloud look like? Find out in this stunning infographic!
Did you get a slice yet? #nationalpizzaday! #wizardlife @ Bit-Wizards https://t.co/xGhprn6nZI
What Are the Hidden Costs of a Building Mobile App? This video dives deep. https://t.co/lQ1Yf6TX8p https://t.co/FfZJP8uBvO