What Every Risk Officer Should Think About in 2021

Dec. 28, 2020
John Hevey, Vice President, Corporate Technical Service at BCS Data Center Operations, highlights what a Risk Officer should be focusing on as a turbulent year in the data industry and beyond comes to a close. 

John Hevey, Vice President, Corporate Technical Service at BCS Data Center Operations, highlights what a Risk Officer should be focusing on as a turbulent year in the data industry and beyond comes to a close. 

John Hevey, Vice President, Corporate Technical Service, BCS Data Center Operations

This year’s global pandemic has forced organizations to challenge conventional wisdom, alter business practices, and try to define a new normal. The data center and critical infrastructure industry is no exception. COVID-19 and its impact on our work forces and communities has motivated data center owners and operators to dust off their business continuity and contingency capabilities to test their effectiveness. Let’s be honest – no one was fully prepared for what 2020 had to offer

To date, real time operational reviews and tweaks have minimized risks for many operators. But what lies ahead, may prove to be a larger challenge and more threatening.  In fact, circumstances are converging to create a period of increased operational risk – a perfect storm that needs to be prioritized and addressed with the correct risk mitigation strategy.

On a macro-level, the environment for this approaching storm is influenced by an increased reliance on outsourcing to cloud solutions, increases in digital and mobile technologies, increased workloads, and the rapidly changing complexity of today’s data center infrastructure.

Is your organization susceptible to this perfect storm? If you are a Risk Officer, ask yourself these questions:

Today’s data center owners and operators need to see the approaching storm, quickly respond to rapidly changing conditions, and address gaps in incident management when changes of state occur within the critical facility.

  • Is your organization seeing large increases in capacity utilization?
  • Do you have tech debt or heavily leveraged, aging infrastructure?
  • Have you deferred preventative or corrective maintenance, or planned infrastructure CAPEX improvements?
  • Have you experienced employee attrition, or been asked to reduce staffing levels in the last 12-months?
  • Is your strategy to shift toward lights-out management?
  • If you experience a workload outage, do seconds and minutes make a difference – versus hours?

If you answered ‘yes’ to any combination of these questions, I encourage you to continue reading.  If you’ve answered ‘no’ to these questions, are you assuming too much good? Out of sight and lack of visibility to a site’s risks should never be out of mind.

Change of State is Predictable Failure – Capture it and Respond Swiftly

Change of State is a core tenant of our physical world. For the human body, change of state is essential to our existence (e.g. our ability to convert oxygen into our blood system) and a warning sign of something that needs our immediate attention (e.g. a spiked temperature). When systems work well ­– life (literally) is good. When systems fail, things can go very bad rather quickly.

Most organizations manage and prioritize change of state when it comes to applications and digital environments. It’s our experience, however, that most organizations don’t place the same level of scrutiny, rigor and discipline around the data center physical environment where their most critical workload assets and applications reside.

Today’s critical infrastructure is an increasingly complex and sophisticated environment comprised of interconnected systems. While traditional data center operations have some disparate systems to report on operational changes of state; most do not have centralized resources and systems dedicated to the detection, reaction, triage, response and timely mitigation of such anomalies. A proven approach that can immediately identify and respond to a change of state, is often referred to as eyes on glass.

Case in point, a nightshift Data Center Engineer is performing maintenance at 1:00 am. At the same time, a critical failure within the heat-rejection system occurs, triggering an email alert. The Engineering Team doesn’t immediately see the email (and may not until much later in the shift) resulting in cascading thermal concerns within the data center environment. What required immediate attention and response, didn’t get it.

Let’s look at a non-data center analogy. The pilot of a commercial airliner flying at 39,000 feet has the ability to fly, control and monitor waypoints between Kansas City and Dallas. An air traffic controller sees that the aircraft has unexpectedly changed altitude. The change (of state) doesn’t correspond with the flight plan or the last instructions from Air-Traffic Control. An anomaly has occurred and triage and action are required to respond and resolve the anomaly and return things to normal. What is missing in our industry – and within most distributed organizations ­– is centralized command-and-control to see the big interconnected picture and the potential for cascading failures.

Get Ahead of Future Events Rather Than React to Them

Today’s data center owners and operators need to see the approaching storm, quickly respond to rapidly changing conditions, and address gaps in incident management when changes of state occur within the critical facility. What’s needed are solutions that extend detection and response capabilities by aggregating telemetry and correlating multiple data points in real-time to enable timely and effective incident response.

Next-generation operators (BCS Data Center Operations included) combine people, processes and technology through a centralized, single-source deployment solution that leverage:

  • Centralized, 7x24x365, eyes-on-glass visibility into critical facilities and physical operations
  • Trained surveillance and ITIL certified analysts that constantly monitor critical environments, looking for and analyzing change of state data
  • Purpose-built, computerized maintenance management systems and business intelligence capabilities
  • An extensive operational playbook to guide real-time incident response actions, communications, root-cause analysis, post-incident action and reporting

With this approach, data center owners and operators can mitigate known (and unknown) operations risks at their data center, with their workloads, and their businesses.

John Hevey is the Vice President, Corporate Technical Service at BCS Data Center Operations.

About the Author

Voices of the Industry

Our Voice of the Industry feature showcases guest articles on thought leadership from sponsors of Data Center Frontier. For more information, see our Voices of the Industry description and guidelines.

Sponsored Recommendations

Guide to Environmental Sustainability Metrics for Data Centers

Unlock the power of Environmental, Social, and Governance (ESG) reporting in the data center industry with our comprehensive guide, proposing 28 key metrics across five categories...

The AI Disruption: Challenges and Guidance for Data Center Design

From large training clusters to small edge inference servers, AI is becoming a larger percentage of data center workloads. Learn more.

A better approach to boost data center capacity – Supply capacity agreements

Explore a transformative approach to data center capacity planning with insights on supply capacity agreements, addressing the impact of COVID-19, the AI race, and the evolving...

How Modernizing Aging Data Center Infrastructure Improves Sustainability

Explore the path to improved sustainability in data centers by modernizing aging infrastructure, uncovering challenges, three effective approaches, and specific examples outlined...

Julia Ardaran / Shutterstock.com

Beyond Hyperscale: Quantum Frederick's Vision for Sustainable Data Center Development

Scott Noteboom, CTO of Quantum Loophole, explains how Quantum Frederick created the first carbon neutral industrial zone.

White Papers

Get the full report

Ethernet in Data Center Networks

Aug. 1, 2022
This white paper from Anritsu discusses Ethernet usage trends in data center networks, as well as the technologies helping operators meet growing bandwidth demands and verify ...