Skip to content

Malfunctioning firmware update leads to Microsoft datacenter overheating

Data center malfunction disclosed by Microsoft, leading to service disruptions on Outlook.com, Hotmail.com, and their associated email services during the previous week.

Microsoft Data Centre Overheats Due to Failed Firmware Update
Microsoft Data Centre Overheats Due to Failed Firmware Update

Malfunctioning firmware update leads to Microsoft datacenter overheating

Microsoft Data Centre Outage: Air Conditioning Failure Causes 16-Hour Email Interruption

Microsoft has apologized for a data centre outage that affected its Outlook.com and Hotmail.com cloud email services on Tuesday afternoon. The outage, which lasted for 16 hours, was caused by a failed firmware upgrade to a component of the physical plant in one of Microsoft's data centres.

The failure resulted in a rapid and substantial temperature spike in the data centre. The plant in a typical data centre includes air conditioning units, and in this case, the failed air conditioning unit took the system offline. It is not uncommon for air conditioning failures to trigger IT outages, as cooling system failures are a known vulnerability in data centres.

Without proper cooling, hardware can overheat, be damaged, or trigger emergency shutdowns to protect equipment, causing service disruptions. Such failures may arise from equipment malfunction, poorly maintained systems, or unexpected increases in heat load.

Although the specific reference data search results do not explicitly break down air conditioning failures as a top outage cause, cooling system failures are well-documented in the industry as a critical vulnerability. Data centre underutilization and downtime are often partly attributed to faults in infrastructure components, including maintenance or hardware (such as HVAC) failures.

The activation of the safeguards designed to protect servers from overheating prevented any automatic failover of other pieces of Microsoft's infrastructure. As a result, the human intervention added significant time to the restoration process. Email inboxes hosted on the affected servers became inaccessible during this period.

This is not the first time air conditioning failures have caused IT outages. For example, in 2010, a failed air conditioning unit took music streaming site Spotify offline for several hours.

Microsoft takes data centre outages very seriously and invests a significant amount of time and energy in preventing them. Outages are generally caused by software bugs, configuration errors, network failures, power interruptions, hardware failures, and cooling system issues such as air conditioning failures. The company sincerely apologizes for the email interruption caused by the outage and is committed to improving its systems to prevent similar incidents in the future.

[1] [Source] [2] [Source] [3] [Source] [4] [Source]

Read also:

Latest

Selecting the Suitable Internet Service Provider: A Guide

Selecting the Ideal Internet Service Provider: A Guide

The significance of a reliable internet connection is vast, enabling numerous tasks and providing entertainment. Its importance extends to aspects like video surveillance and the smooth operation of a 'smart home'. Hence, reliable internet service providers play a crucial role in various facets...