Uptime: Networks, Software Play Growing Role in Data Center Outages

April 19, 2021
Networking and software issues are emerging as two of the more common causes of data center outages, while power problems are becoming somewhat less of an issue, according to new data from Uptime Institute.

Networking and software issues are emerging as two of the more common causes of data center outages, while power problems are becoming somewhat less of an issue, according to new data from Uptime Institute’s Annual Outage Analysis.

This trend is not surprising, given the growing role of cloud computing and SaaS (software as a service) applications, which increasingly use architectures that can route around physical failures of electrical components like UPS systems, transfer switches and generators. To be sure, power chain issues still have a major impact on downtime, as seen in several headline-grabbing incidents in early 2021.

“Overall, the causes of outages are changing,” said Andy Lawrence, executive director of research, Uptime Institute. “Software and IT configuration issues are becoming more common, while power issues are now less likely to cause a major IT service outage.”

Online services were more important than ever in 2020, as the COVID-19 pandemic and social distancing practices boosted remote work and learning. That meant that service outages were more broadly felt, and generated wider notice.

“Although there were significant disruptions affecting financial trading, government services, internet and telecom, the outages that made headlines in 2020 were often about the impact to consumers and workers at home, with interruptions to applications such as Microsoft Exchange and Teams, Zoom, fitness trackers and the like,” Uptime noted.

Outages Create More Concern, More Cost

The cost of data center outages goes beyond headlines and user complaints on social media. More than half of the respondents who reported an outage to Uptime in the past three years estimated its cost at more than $100,000, and almost a third reported costs of $1 million or above.

“Resiliency remains near the top of management priorities when delivering business services,” said Lawrence. “The fact is outages remain common and justify the increased concern and investment in preventing them. Because of the disruption and high costs that result from disrupted IT services, identifying and analyzing the root causes of failures is a critical step in avoiding more expensive problems.”

Some of the findings from Uptime’s 2020 survey include:

  • Almost half (44%) of data center operators surveyed think that concern about resiliency of data center/mission-critical IT has increased in the past twelve months.
  • Serious and severe outages are less common (one in six reported having one in the past three years) but can have catastrophic results for stakeholders. Vigilance and investment are necessary.
  • More than half (56%) of all organizations using a third-party data service have experienced a moderate or serious IT service outage in the last three years that was itself caused by the provider.

As Architecture Shifts, So Does Outage Culprits

The focus on third-party service performance accompanies the ongoing shift from on-premises data centers to the use of colocation facilities and cloud platforms, which has a positive impact on uptime, but also amplifies any failures in the networks and software automation that drive the cloud delivery model. (For more on this shift, see our DCF article “Rethinking Redundancy: Is Culture Part of the Problem”).

“This rise in outages caused by IT systems and network issues is due to the broad shift in recent years from siloed IT services running on dedicated, specialized equipment to an architecture in which more IT functions run on standard IT systems, often distributed or replicated across many sites,” Uptime says in its outage report. “As more organizations move to cloud-based, distributed IT (driven by a desire for greater agility and automation), the underlying data center infrastructure is becoming less of a focus or a single point of failure.

“This does not mean, however, that there is any case, at least at present, for de-emphasizing site-level resiliency or investing less,” Uptime added. “Site-level failures invariably cause major problems, regardless of whether distributed resiliency architectures are deployed.”

As the originator of the Tier System, which has long been used as a benchmark for reliability design, Uptime has an ongoing interest in equipment redundancy, a key focus for the tier ratings. As recent events have shown, power equipment continues to be central to uptime.

  • In March, an OVH data center in Strasbourg, France was destroyed by a fire.  While no final analysis has been provided, early indications point to UPS units as the likely origin of the incident.
  • This month, an emergency generator caught fire at a WebNX data center in Ogden, Utah, causing the full shutdown of the data center and lengthy outages for customers.
About the Author

Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

Sponsored Recommendations

Tackling Utility Project Challenges with Fiberglass Conduit Elbows

Explore how fiberglass conduit elbows tackle utility project challenges like high costs, complex installations, and cable damage. Discover the benefits of durable, cost-efficient...

How Deep Does Electrical Conduit Need to Be Buried?

In industrial and commercial settings conduit burial depth can impact system performance, maintenance requirements, and overall project costs.

Understanding Fiberglass Conduit: A Comprehensive Guide

RTRC (Reinforced Thermosetting Resin Conduit) is an electrical conduit material commonly used by industrial engineers and contractors.

NECA Manual of Labor Rates Chart

See how Champion Fiberglass compares to PVC, GRC and PVC-coated steel in installation.

Siwakorn1933/Shutterstock.com
Source: Siwakorn1933/Shutterstock.com

Vendor Diversification vs Vendor Consolidation: What’s the Best Way to Ensure Supply Chain Resilience for Your Data Center Construction Projects?

Joey Wagner, Program Management Subject Matter Expert for Blueprint Supply Chain, outlines the benefits of two supply chain vendor strategies and explores how each can impact ...

White Papers

Dcf Se Wp Cover2021 12 08 9 10 31

Guide to Environmental Sustainability Metrics for Data Centers

Dec. 13, 2021
As more and more companies are reporting on their Environmental, Social, and Governance (ESG) programs, there’s a need for standardized sustainability metrics, especially in the...