• About Us
  • Partnership Opportunities
  • Privacy Policy

Data Center Frontier

Charting the future of data centers and cloud computing.

  • Cloud
    • Hyperscale
  • Colo
    • Site Selection
    • Interconnection
  • Energy
    • Sustainability
  • Cooling
  • Technology
    • Internet of Things
    • AI & Machine Learning
    • Edge Computing
    • Virtual Reality
    • Autonomous Cars
    • 5G Wireless
    • Satellites
  • Design
    • Servers
    • Storage
    • Network
  • Voices
  • Podcast
  • White Papers
  • Resources
    • COVID-19
    • Events
    • Newsletter
    • Companies
    • Data Center 101
  • Jobs
You are here: Home / Cloud / Problems With AWS Network Devices Caused Widespread Cloud Outage

Problems With AWS Network Devices Caused Widespread Cloud Outage

By Rich Miller - December 7, 2021

Problems With AWS Network Devices Caused Widespread Cloud Outage

Amazon Web Services data centers in Loudoun County, Virginia. (Photo: Rich Miller)

LinkedinTwitterFacebookSubscribe
Mail

Problems with several network devices in Northern Virginia caused a major outage at Amazon Web Services, with the ripples spreading across the Internet to interrupt service for many popular web services that run their infrastructure on the AWS cloud.

The lengthy outage highlighted the essential role played by cloud platforms like AWS, which support the web operations of at least 1 million enterprise customers. The problems at AWS were blamed for performance issues at Netflix, Disney+, Ring, Ticketmaster, Venmo, Roku. Fidelity Investments, Hootsuite, and many others. The outage interrupted online finals for students using the Canvas Learning Management platform, and even deliveries at Amazon warehouses, as the outage impacted apps required to scan packages and plan delivery routes.

The AWS outage was focused on US-East-1, a service region based in Northern Virginia which houses the largest concentration of Amazon data center infrastructure. The problems began at around 12:30 p.m. Eastern, when users began to experience problems accessing AWS services. Approximately 5 hours later, at 5:47 p.m., AWS reported that it had “mitigated the underlying issue” and services were beginning to be restored.

“The root cause of this issue is an impairment of several network devices in the US-EAST-1 Region,” AWS said on its status page. As of 7:30 pm Eastern, AWS said the network devices issues had been resolved, and it was “now working towards recovery of any impaired services.”

Large-scale IT service outages can be expensive. A 2021 survey from The Uptime Institute found that data center outages cost companies an average of $100,000 per incident, with about a third of respondents citing costs of $1 million or more.

The stakes could be even higher for Amazon Web Services, which is the largest cloud computing platform. AWS had revenue of $16.4 billion in the third quarter of 2021, which works out to about $7.4 million per hour. Although cloud workloads running outside the US-East-1 region apparently were unaffected, an outage lasting more than six hours in the largest cloud region would add up quickly – although such “losses” at service providers are often accounted for through customer credits.

Why Networks Are So Important

The rise of cloud computing underscores the importance of networks and how they are configured. Networking and software issues are surpassing power outages as the most common causes of data center downtime, according to 2021 outage data from Uptime Institute. This trend reflects the growing role of cloud computing and SaaS (software as a service) applications, which often use architectures that can route around physical failures of electrical components like UPS systems, transfer switches and generators.

When Amazon Web Services experiences reliability problems, they often involve US -East-1, which is not surprising because it is the largest AWS region and also the oldest, as Amazon has had data centers in Virginia since 2004.  AWS has spent $35 billion on its cloud computing infrastructure in Northern Virginia over the past 10 years, and operates about 50 data centers in the region. It’s the largest single concentration of corporate data centers on earth, positioned near a strategic Internet intersection in Ashburn, which serves as a global crossroads for data traffic.

Network problems are complicated by the highly-automated nature of cloud platforms. These data traffic flows are designed to be large and fast and work without human intervention – which makes them hard to tame when humans intervene. Some of the largest outages impacting cloud platforms and social networks have been tied to network problems. Some examples:

  • On October 5, a configuration error broke Facebook’s connection to a key network backbone, disconnecting all of its data centers from the Internet and leaving its DNS servers unreachable, the company said.
  • A lengthy 2019 Google outage was caused by unusual network congestion in its operations in the Eastern U.S. In an incident report, Google said that YouTube measured a 10 percent drop in global views during the incident, while Google Cloud Storage measured a 30 percent reduction in traffic.

Resiliency is Still A Challenge

At DCF we have often noted how cloud computing is bringing change to how companies approach uptime, introducing architectures that create resiliency using software and network connectivity (See “Rethinking Redundancy”). This strategy, pioneered by cloud providers, is creating new ways of designing applications. Data center uptime has historically been achieved through layers of redundant electrical infrastructure, including uninterruptible power supply (UPS) systems and emergency backup generators.

Free Resource from Data Center Frontier White Paper Library

cloud data centers
Yes, the Cloud Is a Catalyst; It’s Also a Competitive Benchmark
Cloud data centers are typically located where the metrics of  total cost of ownership, flexibility, performance, and ‘righteousness’ are optimized. This white paper provides an overview of the US markets with the lowest total cost of ownership by ranking them based on land, energy, network and labor costs.
We always respect your privacy and we never sell or rent our list to third parties. By downloading this White Paper you are agreeing to our terms of service. You can opt out at any time.

Get this PDF emailed to you.

Cloud providers like Google have been leaders in creating failover scenarios that shift workloads across data centers, spreading applications and backup systems across multiple data centers, and using sophisticated software to detect outages and redirect data traffic to route around hardware failures and utility power outages.

Amazon Web Services has been a pioneer in this effort by popularizing the use of availability zones (AZs), clusters of data centers within a region that allow customers to run instances of an application in several isolated locations to avoid a single point of failure. These architectures enable sophisticated approaches to failover and backup of applications. But even a distributed uptime plan can break down if the network fails, breaking the flow of data across cloud infrastructure.

As often happens with AWS downtime, the incident prompted some to wonder about whether cloud computing has reached a scale where the downtime equation is shifting.

“A multi-day full outage of us-east-1 will have an observable effect on the world economy,” tweeted Corey Quinn, Chief Cloud Economist at The Duckbill Group. “That is not an exaggeration.

“I don’t think AWS has done anything wrong here,” Quinn tweeted. “This is the natural end result of their success at massive scale.”

LinkedinTwitterFacebookSubscribe
Mail

Tagged With: Amazon Web Services, Downtime

Newsletters

Stay informed: Get our weekly updates!

Are you a new reader? Follow Data Center Frontier on Twitter or Facebook.

About Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

  • Facebook
  • Instagram
  • LinkedIn
  • Pinterest
  • Twitter

Voices of the Industry

Mitigate Risk, Improve Performance and Decrease Operating Expenses through Data Center Self-Performance

Mitigate Risk, Improve Performance and Decrease Operating Expenses through Data Center Self-Performance If a vendor conducts the actual work in your data center, then you or your operator aren’t maximizing your current operating resources and are experiencing incremental cost and risk. Chad Giddings of BCS Data Center Operations, explains the importance of your data center provider having a high-degree of self-performance.

White Papers

wet stacking

Data Center Generator Maintenance

A new white paper from Kohler Power Systems explains the feasibility and benefits of no-load exercising for diesel generator operators.

Get this PDF emailed to you.

We always respect your privacy and we never sell or rent our list to third parties. By downloading this White Paper you are agreeing to our terms of service. You can opt out at any time.

DCF Spotlight

Data center modules on display at the recent Edge Congress conference in Austin, Texas. (Photo: Rich Miller)

Edge Computing is Poised to Remake the Data Center Landscape

Data center leaders are investing in edge computing and edge solutions and actively looking at new ways to deploy edge capacity to support evolving business and user requirements.

An aerial view of major facilities in Data Center Alley in Ashburn, Virginia. (Image: Loudoun County)

Northern Virginia Data Center Market: The Focal Point for Cloud Growth

The Northern Virginia data center market is seeing a surge in supply and an even bigger surge in demand. Data Center Frontier explores trends, stats and future expectations for the No. 1 data center market in the country.

See More Spotlight Features

Newsletters

Get the Latest News from Data Center Frontier

Job Listings

RSS Job Openings | Pkaza Critical Facilities Recruiting

  • Electrical Commissioning Engineer - Los Angeles, CA
  • Data Center Construction Project Manager - Ashburn, VA
  • Critical Power Energy Manager - Data Center Development - Dallas, TX
  • Data Center Facilities Operations VP - Seattle, WA
  • Senior Electrical Engineer - Data Center - Dallas, TX

See More Jobs

Data Center 101

Data Center 101: Mastering the Basics of the Data Center Industry

Data Center 101: Mastering the Basics of the Data Center Industry

Data Center Frontier, in partnership with Open Spectrum, brings our readers a series that provides an introductory guidebook to the ins and outs of the data center and colocation industry. Think power systems, cooling, solutions, data center contracts and more. The Data Center 101 Special Report series is directed to those new to the industry, or those of our readers who need to brush up on the basics.

  • Data Center Power
  • Data Center Cooling
  • Strategies for Data Center Location
  • Data Center Pricing Negotiating
  • Cloud Computing

See More Data center 101 Topics

About Us

Charting the future of data centers and cloud computing. We write about what’s next for the Internet, and the innovations that will take us there. We tell the story of the digital economy through the data center facilities that power cloud computing and the people who build them. Read more ...
  • Facebook
  • LinkedIn
  • Pinterest
  • Twitter

About Our Founder

Data Center Frontier is edited by Rich Miller, the data center industry’s most experienced journalist. For more than 20 years, Rich has profiled the key role played by data centers in the Internet revolution. Meet the DCF team.

TOPICS

  • 5G Wireless
  • Cloud
  • Colo
  • Connected Cars
  • Cooling
  • Cornerstone
  • Coronavirus
  • Design
  • Edge Computing
  • Energy
  • Executive Roundtable
  • Featured
  • Finance
  • Hyperscale
  • Interconnection
  • Internet of Things
  • Machine Learning
  • Network
  • Podcast
  • Servers
  • Site Selection
  • Social Business
  • Special Reports
  • Storage
  • Sustainability
  • Videos
  • Virtual Reality
  • Voices of the Industry
  • Webinar
  • White Paper

Copyright Endeavor Business Media© 2022