• About Us
  • Partnership Opportunities
  • Privacy Policy

Data Center Frontier

Charting the future of data centers and cloud computing.

  • Cloud
    • Hyperscale
  • Colo
    • Site Selection
    • Interconnection
  • Energy
    • Sustainability
  • Cooling
  • Technology
    • Internet of Things
    • AI & Machine Learning
    • Edge Computing
    • Virtual Reality
    • Autonomous Cars
    • 5G Wireless
    • Satellites
  • Design
    • Servers
    • Storage
    • Network
  • Voices
  • Podcast
  • White Papers
  • Resources
    • COVID-19
    • Events
    • Newsletter
    • Companies
    • Data Center 101
  • Jobs
You are here: Home / Voices of the Industry / 5-Point Plan for Eliminating Human Error in Data Center Operations

5-Point Plan for Eliminating Human Error in Data Center Operations

By Voices of the Industry - August 26, 2020 Leave a Comment

5-Point Plan for Eliminating Human Error in Data Center Operations

What is missing in our industry – and within most distributed organizations ­– is centralized command-and-control to see the big interconnected picture and the potential for cascading failures. (Photo: Courtesy of BCS)

LinkedinTwitterFacebookSubscribe
Mail

John Hevey, Vice President, Corporate Technical Service at BCS Data Center Operations, offers a step-by-step plan to cut down on human error and data center outages in data center operations. 

data center operations

John Hevey, Vice President, Corporate Technical Service at BCS Data Center Operations

Can we talk?

Once again, Uptime Institute’s Global Data Center Survey reports the high likelihood, disturbing frequency, and increased damage caused by data center outages. Findings from 2020 show that 78% of organizations say they had an IT-related outage in the past three years. 75% of organizations say their most recent incident could have been prevented with better management or improved processes, making the vast majority of outages a result of human error.

Let that sink in.  Three-quarters of the outages that took place over the last 12-months could have been prevented, yet we’re not talking about it. We seem to accept that number.

To put this in perspective, what if the commercial airline industry had a 75% human error failure rate? Thankfully, this is not the case.  The aviation industry promotes strict discipline, engineered process, a checklist regimen, and a predictive and preventative approach as a foundation for operational rigor.

The mission-critical industry is accountable for operating and protecting some of the world’s most vital workloads in some of the most complex critical facilities in the data center ecosystem. 100% availability of critical IT applications is the de facto performance standard.

Yet, Uptime Institute reports that the industry achieves this less than 25% of the time, with most failures associated with human actions and behaviors. It’s time for a paradigm shift for the mission-critical industry – one that changes the human error statistics.

Root Causes

Why do critical facility outages still happen, and why have human error failure rates remained flat?  This article is not an assault on hard-working, intelligent, and dedicated data center engineers. Failure analysis routinely exposes underinvestment in people, processes and technology necessary to operate mission-critical facilities in the manner in which they were designed.

Key Findings

  • More significant outages are becoming more painful
  • Operators admit that most outages are their fault
  • Power problems are still the most significant cause of major outages

Disasters do happen; (most) outages, however, should not. On the surface, outages look the same, affecting access and availability of applications, systems, and services. However, if we look behind the curtain, we can see areas where we can prevent outages and increase overall availability. At BCS, we believe the path forward lies in closing gaps in process, training, and resources.

Looking Ahead: 5-Point Plan

Create and improve operational procedures and documentation.

Despite the broad availability of established standards and best practices from Uptime Institute, NFPA, ISO, IEEE, and OSHA, many people accountable for the reliability of mission-critical environments are not applying them in a unifying framework.

Organizations should adopt and embrace a playbook as the foundation of their operations program model; one that is site-specific and able to scale based on changing business demands and infrastructure needs. Doing so will introduce a strict, operational discipline that will help eliminate human errors.

Institute training as part of operational DNA.

Data center operators have historically struggled with training. Some operators talk about it, few actually do it, and even fewer do it well.  Comprehensive training must be a foundational element of any good critical facility operations program.

Critical facility training best practices include:

  • Develop training programs that apply to specific, site-level assets and systems
  • Institute a skills assessment program that identifies gaps in skill and knowledge
  • Increase team preparedness through the continuous execution of drills using site-specific emergency operating procedures (EOPs)
  • Leverage a variety of industry accreditation and certification programs

Establish continuous improvement and risk mitigation programs that target human behaviors.

Next-generation operators have developed programs designed to infuse a continuous improvement rigor to proactively eliminate operational risks.  Program examples include:

  • Find anomalies, flush out what drives the anomaly and engineer plans to fix the anomaly before it becomes intrusive to normal operations. A Find, Flush, and Fix program approach is a systematic method of identifying deficiencies to systems issues before they become problematic.
  • A Near-Miss Program encourages transparent reporting and open discussion about incidents that could have happened but were identified and remedied before they did. The ability to debrief as a team and talk about a near-miss is the height of program maturity.

Consolidate operations into a single-source, self-performance operations model.

Original Equipment Manufacturers (OEMs) complement any good mission-critical program and add value for their proprietary knowledge, firmware updates, and annual required preventative maintenance.  However, OEMSs are not invested in the day-to-day success or long-term performance of your facility, especially those which lack the formal training and discipline to work in mission-critical environments.

A reliance on numerous vendors creates operational risk. Most owners are moving towards working with a dedicated, single-source operations provider that is committed to self-performance of maintenance and operations.

Operators must adopt a stewardship mindset.

Data center owners entrust their critical IT environment, assets, and services to individuals. These individuals must adopt a stewardship mindset that puts the operation and protection of the critical environment and the services delivered as their top priority.

John Hevey is Vice President, Corporate Technical Service at BCS Data Center Operations. 

LinkedinTwitterFacebookSubscribe
Mail

Tagged With: BCS Data Center Operations, Data Center Operations, Data Center Outages

Newsletters

Stay informed: Get our weekly updates!

Are you a new reader? Follow Data Center Frontier on Twitter or Facebook.
voices@richmiller.biz'

About Voices of the Industry

Our Voice of the Industry feature showcases guest articles on thought leadership from sponsors of Data Center Frontier. For more information, see our Voices of the Industry description and guidelines..

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Facebook
  • Instagram
  • LinkedIn
  • Pinterest
  • Twitter

Voices of the Industry

5 Ways to Mitigate Supply Chain Unpredictability and Labor Shortages in Data Center Construction

5 Ways to Mitigate Supply Chain Unpredictability and Labor Shortages in Data Center Construction Blake Weaver, Data Center Specialist at ProLift Rigging offers a list of ways to overcome supply chain challenges and labor shortages in data center construction. 

DCF Spotlight

The COVID-19 Crisis and the Data Center Industry

The COVID-19 pandemic presents strategic challenges for the data center and cloud computing sectors. Data Center Frontier provides a one-stop resource for the latest news and analysis for decision-makers navigating this complex new landscape.

An aerial view of major facilities in Data Center Alley in Ashburn, Virginia. (Image: Loudoun County)

Northern Virginia Data Center Market: The Focal Point for Cloud Growth

The Northern Virginia data center market is seeing a surge in supply and an even bigger surge in demand. Data Center Frontier explores trends, stats and future expectations for the No. 1 data center market in the country.

See More Spotlight Features

White Papers

data center site

Five Things To Know About Data Center Site Selection

A data center is a long-term investment, and choosing a location requires being sensitive both the needs of the business and the surrounding community.  If selecting a site for a hyperscale facility, get the new special report that explores five rules of the road to keep your business and the surrounding community in perfect harmony. 

Get this PDF emailed to you.

We always respect your privacy and we never sell or rent our list to third parties. By downloading this White Paper you are agreeing to our terms of service. You can opt out at any time.

Newsletters

Get the Latest News from Data Center Frontier

Job Listings

RSS Job Openings | Peter Kazella and Associates, Inc

  • Navy Electrician / Navy Mechanic - Redmond, WA
  • Electrical Commissioning Engineer - Ashburn, VA
  • MEP Superintendent - Data Center - Dallas, TX
  • Construction Project Manager - Data Center - Dallas, TX
  • Data Center QA / QC Manager - Huntsville, AL

See More Jobs

Data Center 101

Data Center 101: Mastering the Basics of the Data Center Industry

Data Center 101: Mastering the Basics of the Data Center Industry

Data Center Frontier, in partnership with Open Spectrum, brings our readers a series that provides an introductory guidebook to the ins and outs of the data center and colocation industry. Think power systems, cooling, solutions, data center contracts and more. The Data Center 101 Special Report series is directed to those new to the industry, or those of our readers who need to brush up on the basics.

  • Data Center Power
  • Data Center Cooling
  • Strategies for Data Center Location
  • Data Center Pricing Negotiating
  • Cloud Computing

See More Data center 101 Topics

About Us

Charting the future of data centers and cloud computing. We write about what’s next for the Internet, and the innovations that will take us there. We tell the story of the digital economy through the data center facilities that power cloud computing and the people who build them. Read more ...
  • Facebook
  • LinkedIn
  • Pinterest
  • Twitter

About Our Founder

Data Center Frontier is edited by Rich Miller, the data center industry’s most experienced journalist. For more than 20 years, Rich has profiled the key role played by data centers in the Internet revolution. Meet the DCF team.

TOPICS

  • 5G Wireless
  • Cloud
  • Colo
  • Connected Cars
  • Cooling
  • Cornerstone
  • Coronavirus
  • Design
  • Edge Computing
  • Energy
  • Executive Roundtable
  • Featured
  • Finance
  • Hyperscale
  • Interconnection
  • Internet of Things
  • Machine Learning
  • Network
  • Podcast
  • Servers
  • Site Selection
  • Social Business
  • Special Reports
  • Storage
  • Sustainability
  • Videos
  • Virtual Reality
  • Voices of the Industry
  • White Paper

Copyright Data Center Frontier LLC © 2021