• About Us
  • Partnership Opportunities
  • Privacy Policy

Data Center Frontier

Charting the future of data centers and cloud computing.

  • Cloud
    • Hyperscale
  • Colo
    • Site Selection
    • Interconnection
  • Energy
    • Sustainability
  • Cooling
  • Technology
    • Internet of Things
    • AI & Machine Learning
    • Edge Computing
    • Virtual Reality
    • Autonomous Cars
    • 5G Wireless
    • Satellites
  • Design
    • Servers
    • Storage
    • Network
  • Voices
  • Podcast
  • White Papers
  • Resources
    • COVID-19
    • Events
    • Newsletter
    • Companies
    • Data Center 101
  • Jobs
You are here: Home / Executive Roundtable / Executive Roundtable: Learning From the British Airways Outage

Executive Roundtable: Learning From the British Airways Outage

By Rich Miller - July 24, 2017

Executive Roundtable: Learning From the British Airways Outage

LinkedinTwitterFacebookSubscribe
Mail

Welcome to our seventh Data Center Executive Roundtable, a quarterly feature showcasing the insights of thought leaders on the state of the data center industry, and where it is headed. In our Second Quarter 2017 roundtable, we will examine four topics: How to eliminate data center downtime like the recent outage at British Airways, how the industry is being affected by robust M&A activity, opportunities for innovation in electrical infrastructure, and how the data center industry can adapt to an increasingly multi-cloud world.

Here’s a look at our distinguished panel:

  • Andrew Schaap: Chief Executive Officer of Aligned Energy, which designs and operates highly-efficient multi-tenant data centers.
  • Shay Demmons, EVP and General Manager of RunSmart software. Demmons is responsible for running all facets of the software business unit including sales, marketing, product development and customer service.
  • Robert McClary, Chief Operating Officer of FORTRUST, a colocation provider headquartered in Denver, Colorado.
  • Bob Woolley, Vice President of Critical Facilities Engineering and Design at RagingWire Data Centers. Woolley is responsible for the teams that design, develop and engineer RagingWire’s portfolio of data centers.

The conversation is moderated by Rich Miller, the founder and editor of Data Center Frontier. Each day this week we will present a Q&A with these executives on one of our key topics. We begin our discussion by asking our panel to address recent headlines about data center service outages.

Data Center Frontier: The recent British Airways data center outage caused widespread disruption to the airline’s operations, with early estimates placing its business impact at more than 80 million pounds ($104 million US). What are the most effective ways to eliminate these type of outages?

ANDREW SCHAAP, Aligned Data Centers

ANDREW SCHAAP, Aligned Data Centers

Andrew Schaap: It has been widely reported how much data center outages cost businesses not just in capital losses, but also to reputation and brand over the long-term. To help avoid these critical disruptions – not just in the airline industry, but across all business sectors – it’s important for company leadership to take an in-depth look at their existing technology systems – both hardware and software – and consider updating those systems to meet the demands of today’s ever-evolving digital world.

According to a 2016 Gartner report, digital business, 24/7 operations, cyberattacks and more business disruptions are switching the conversation from recovery to resilience. Continuous monitoring of your resilience will allow IT to take proactive measures to improve it without the risk of failure. But even today, a year after the report was published, businesses expect resiliency to be built into the application. Perhaps one of the most effective ways to eliminate outages is for management to select specific availability zones that are best suited to meet their needs. If one area has overflow, administrators can distribute that traffic out to other locations.

ROBERT WOOLLEY. RagingWire Data Centers

ROBERT WOOLLEY. RagingWire Data Centers

Robert Woolley, RagingWire Data Centers: According to published accounts, the British Airways incident was caused by an operator at their Boadicea House (BoHo) data center. The operator improperly disconnected, and then reconnected system power in such a way that it caused a power surge that damaged some IT equipment, taking a number of production systems off line. The ascribed cause of the failure was operator error.

While human error may have precipitated the incident, it’s clear that there were other contributing factors. Facilities such as BoHo are designed to tolerate the loss of a single electrical feed, and its protective devices should prevent power surges from reaching the IT equipment. Moreover, the BoHo facility is only one of several data centers that support BA’s critical operations. Failover to a secondary facility should have occurred automatically, but didn’t.

The evidence points to a cascading series of errors that resulted in a catastrophic failure, which is typical of major data center incidents. It may be convenient to point to a single cause such as human error, but every link needs to be strong for the chain to maintain integrity.

The lesson learned from this outage is that a simple error can compound into a large scale failure. Better procedures and training could minimize future human errors, but a component failure might produce the same result in the future if other remedies are not enacted.

The root cause is only part of the story. Emphasis should be on the proper operation of the failover scheme between data centers to protect against ANY facility failure. Secondly, the design of the electrical system should preclude the ability to harm the critical load due to a switching procedure. Of course, proper training and change control methodology are also essential.

SHAY DEMMONS, BASELAYER RunSmart

SHAY DEMMONS, RunSmart

Shay Demmons, RunSmart: The British Airways outage highlights the exposure of not having a well thought out operational plan that accounts for failures and other expected operating conditions. Failures are part of a well-conceived operational plan, and it is the responsibility of the core infrastructure team to have the people, processes, and technology in place to identify and react to this wide range of operating conditions. All applications and services depend on this core infrastructure and yet a single human error or cyber-attack can wreak havoc on an enterprise and their core business unless the operating plan includes provisions for those conditions and the optimal responses.

One of the most basic yet readily available technologies today is that associated with detection and response. Many Data Centers lack this basic real-time monitoring of infrastructure, which is the first step to truly eliminating these sorts of outages. Once detected, most data centers lack the ability to respond to these conditions. The concept of a software-defined data center (SDDC) includes both the detection of failures and automated response capabilities. The power of integration across the IT and Facilities layers can be leveraged by these software-defined structures, allowing business continuity goals to be met. Bottom line: No longer can IT and infrastructure management remain siloed systems. They need to compliment and respond together to mitigate disruption with limited (or no) human intervention.

For example, after receiving real-time information that a system is failing or being removed from service for maintenance purposes, the automated response needs to determine the best course of action to shed demand and maintain a level of business services consistent with the business itself. This may include preemptively shutting down non-critical servers, throttling equipment, bursting into the cloud, or turning on other assets. With a well-conceived software-defined data center which includes failure detection and automated response, operational changes can happen quickly and automatically without human intervention.

Robert McClary, FORTRUST

Robert McClary, FORTRUST

Robert McClary: I would say two things along these lines. First of all focus the most time and effort around the “most likely” causes of downtime; if the statistics are anywhere near true and 60 to 80 percent of outages occur due to human error, then, certainly, not enough time is spent on correcting human error. Secondly, another major cause of downtime, statistically, is poor maintenance and lifecycle strategy. These two items make up the majority of the most likely causes of downtime. These are both avoidable with effort. A lot of people believe that they can design around human error. And what happens is, when you start trying to design around human error, you create a lot of complexity in the infrastructure design, and then start creating more problems by overcomplicating designs.

It’s a self-fulfilling prophecy. You try to design around human error but you end up causing more of it. We aren’t spending enough time trying to eliminate or mitigate human error, we are spending time and money on complex designs to eliminate human error and we’re causing complexity that causes more human error. Focusing on eliminating human error is much less expensive than designing around it. We just refuse to admit that human error is not inevitable, and that it can be minimized or eliminated. We just don’t want to do it, it’s too hard, it’s psychological; it’s not comfortable. Equipment failures are also avoidable. Comprehensive predictive and preventive maintenance is about ten times cheaper over the long haul than corrective maintenance. We think we save time and money but end up paying a bigger price on the backend.

NEXT: How is the ongoing M&A impacting the data center industry?

Keep pace with the fact-moving world of data centers and cloud computing by following us on Twitter and Facebook, connecting with me on LinkedIn, and signing up for our weekly newspaper using the form below:

LinkedinTwitterFacebookSubscribe
Mail

Tagged With: Aligned, Fortrust, RagingWire Data Centers, RunSmart

Newsletters

Stay informed: Get our weekly updates!

Are you a new reader? Follow Data Center Frontier on Twitter or Facebook.

About Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

Comments

  1. Tsimpkin@cnet-training.com'Dr Theresa Simpkin says

    July 27, 2017 at 10:47 am

    Sadly, the issue of human influences in DC failures and downtime is poorly understood. The matter is not only embedded in systems design it is also played out in the matter of misplaced trust in the competence of people. Over time even the most experienced individual will become unconsciously incompetent – that is, they’ll become so blase about their abilities that it is almost inevitable that they will make mistakes or have misplaced confidence in how they should perform or respond in a non-routine task.

    Training spend can be misplaced on those who are already confident in their actions and competent to perform those actions whereas, it is almost taken as a given that staff will perform appropriately just because they’ve been along to training. Measuring competence and confidence is the key and should be implemented as a routine measurement of individual and team capability.

    Systems design is better informed by understanding where humans will opt to create workarounds or shortcuts – the practical creep that is often found in failures of what were thought of as robust processes.

  • Facebook
  • Instagram
  • LinkedIn
  • Pinterest
  • Twitter

Voices of the Industry

Equipment Longevity & Performance: Why the Bathtub Curve Is Inaccurate

Equipment Longevity & Performance: Why the Bathtub Curve Is Inaccurate Chad Peters, Director of Infrastructure Solutions for Service Express, revisits the Bathtub Curve theory and explains how to track equipment reliability and performance for data-driven buying decisions.

White Papers

The Future of Future Proofing

Venyu provides IT leaders with a framework for future-proofing their systems, networks, and partner ecosystems. 

Get this PDF emailed to you.

We always respect your privacy and we never sell or rent our list to third parties. By downloading this White Paper you are agreeing to our terms of service. You can opt out at any time.

DCF Spotlight

Data center modules on display at the recent Edge Congress conference in Austin, Texas. (Photo: Rich Miller)

Edge Computing is Poised to Remake the Data Center Landscape

Data center leaders are investing in edge computing and edge solutions and actively looking at new ways to deploy edge capacity to support evolving business and user requirements.

An aerial view of major facilities in Data Center Alley in Ashburn, Virginia. (Image: Loudoun County)

Northern Virginia Data Center Market: The Focal Point for Cloud Growth

The Northern Virginia data center market is seeing a surge in supply and an even bigger surge in demand. Data Center Frontier explores trends, stats and future expectations for the No. 1 data center market in the country.

See More Spotlight Features

Newsletters

Get the Latest News from Data Center Frontier

Job Listings

RSS Job Openings | Pkaza Critical Facilities Recruiting

  • MEP Coordinator - Data Center Construction - Ashburn, VA
  • Data Center Facility Engineer - Chantilly, VA
  • Data Center Site Operations VP - Seattle, WA
  • Senior Electrical Engineer - Data Center - Denver, CO
  • Senior Estimator - Data Center Construction - Denver, CO

See More Jobs

Data Center 101

Data Center 101: Mastering the Basics of the Data Center Industry

Data Center 101: Mastering the Basics of the Data Center Industry

Data Center Frontier, in partnership with Open Spectrum, brings our readers a series that provides an introductory guidebook to the ins and outs of the data center and colocation industry. Think power systems, cooling, solutions, data center contracts and more. The Data Center 101 Special Report series is directed to those new to the industry, or those of our readers who need to brush up on the basics.

  • Data Center Power
  • Data Center Cooling
  • Strategies for Data Center Location
  • Data Center Pricing Negotiating
  • Cloud Computing

See More Data center 101 Topics

About Us

Charting the future of data centers and cloud computing. We write about what’s next for the Internet, and the innovations that will take us there. We tell the story of the digital economy through the data center facilities that power cloud computing and the people who build them. Read more ...
  • Facebook
  • LinkedIn
  • Pinterest
  • Twitter

About Our Founder

Data Center Frontier is edited by Rich Miller, the data center industry’s most experienced journalist. For more than 20 years, Rich has profiled the key role played by data centers in the Internet revolution. Meet the DCF team.

TOPICS

  • 5G Wireless
  • Cloud
  • Colo
  • Connected Cars
  • Cooling
  • Cornerstone
  • Coronavirus
  • Design
  • Edge Computing
  • Energy
  • Executive Roundtable
  • Featured
  • Finance
  • Hyperscale
  • Interconnection
  • Internet of Things
  • Machine Learning
  • Network
  • Podcast
  • Servers
  • Site Selection
  • Social Business
  • Special Reports
  • Storage
  • Sustainability
  • Videos
  • Virtual Reality
  • Voices of the Industry
  • Webinar
  • White Paper

Copyright Endeavor Business Media© 2022