When Microsoft’s Outlook.com and Hotmail.com email services started becoming unavailable to users on the afternoon of March 12, it was inevitable that pundits would point to problems in the migration of millions of users from the Hotmail service to the new Outlook.com enterprise as the cause of the problem. But given the scope of the beta process and the high-profile nature of the migration it really seemed unlikely that Microsoft would have missed something that would have taken the service offline for the 16 hours of service interruption that users experienced.
Microsoft’s root cause analysis, published on Wednesday afternoon, narrowed the culprit down not to issues with the IT load equipment in the datacenter, but rather a software failure in the HVAC management system after a software upgrade was applied. When the cooling and air management systems failed to operate properly, the temperature spike in the datacenter triggered a cascade of failures as automated measures in place to protect IT load and data took those servers offline. The shutdowns also apparently impacted the automated failover process and rather than switching users to the failover systems, users were locked out of their Hotmail and Outlook.com resources.