Sponsored

What Every Risk Officer Should Think About in 2021

John Hevey, Vice President, Corporate Technical Service at BCS Data Center Operations, highlights what a Risk Officer should be focusing on as a turbulent year in the data industry and beyond comes to a close.

Voices of the Industry

Dec. 28, 2020

5 min read

Add Us On Google

What is missing in our industry – and within most distributed organizations – is centralized command-and-control to see the big interconnected picture and the potential for cascading failures. (Photo: Courtesy of BCS) — What is missing in our industry – and within most distributed organizations – is centralized command-and-control to see the big interconnected picture and the potential for cascading failures.
(Photo: Courtesy of BCS)

John Hevey, Vice President, Corporate Technical Service, BCS Data Center Operations

This year’s global pandemic has forced organizations to challenge conventional wisdom, alter business practices, and try to define a new normal. The data center and critical infrastructure industry is no exception. COVID-19 and its impact on our work forces and communities has motivated data center owners and operators to dust off their business continuity and contingency capabilities to test their effectiveness. Let’s be honest – no one was fully prepared for what 2020 had to offer

To date, real time operational reviews and tweaks have minimized risks for many operators. But what lies ahead, may prove to be a larger challenge and more threatening. In fact, circumstances are converging to create a period of increased operational risk – a perfect storm that needs to be prioritized and addressed with the correct risk mitigation strategy.

On a macro-level, the environment for this approaching storm is influenced by an increased reliance on outsourcing to cloud solutions, increases in digital and mobile technologies, increased workloads, and the rapidly changing complexity of today’s data center infrastructure.

Is your organization susceptible to this perfect storm? If you are a Risk Officer, ask yourself these questions:

Today’s data center owners and operators need to see the approaching storm, quickly respond to rapidly changing conditions, and address gaps in incident management when changes of state occur within the critical facility.

Is your organization seeing large increases in capacity utilization?
Do you have tech debt or heavily leveraged, aging infrastructure?
Have you deferred preventative or corrective maintenance, or planned infrastructure CAPEX improvements?
Have you experienced employee attrition, or been asked to reduce staffing levels in the last 12-months?
Is your strategy to shift toward lights-out management?
If you experience a workload outage, do seconds and minutes make a difference – versus hours?

If you answered ‘yes’ to any combination of these questions, I encourage you to continue reading. If you’ve answered ‘no’ to these questions, are you assuming too much good? Out of sight and lack of visibility to a site’s risks should never be out of mind.

Change of State is Predictable Failure – Capture it and Respond Swiftly

Change of State is a core tenant of our physical world. For the human body, change of state is essential to our existence (e.g. our ability to convert oxygen into our blood system) and a warning sign of something that needs our immediate attention (e.g. a spiked temperature). When systems work well – life (literally) is good. When systems fail, things can go very bad rather quickly.

Most organizations manage and prioritize change of state when it comes to applications and digital environments. It’s our experience, however, that most organizations don’t place the same level of scrutiny, rigor and discipline around the data center physical environment where their most critical workload assets and applications reside.

Today’s critical infrastructure is an increasingly complex and sophisticated environment comprised of interconnected systems. While traditional data center operations have some disparate systems to report on operational changes of state; most do not have centralized resources and systems dedicated to the detection, reaction, triage, response and timely mitigation of such anomalies. A proven approach that can immediately identify and respond to a change of state, is often referred to as eyes on glass.

Case in point, a nightshift Data Center Engineer is performing maintenance at 1:00 am. At the same time, a critical failure within the heat-rejection system occurs, triggering an email alert. The Engineering Team doesn’t immediately see the email (and may not until much later in the shift) resulting in cascading thermal concerns within the data center environment. What required immediate attention and response, didn’t get it.

Let’s look at a non-data center analogy. The pilot of a commercial airliner flying at 39,000 feet has the ability to fly, control and monitor waypoints between Kansas City and Dallas. An air traffic controller sees that the aircraft has unexpectedly changed altitude. The change (of state) doesn’t correspond with the flight plan or the last instructions from Air-Traffic Control. An anomaly has occurred and triage and action are required to respond and resolve the anomaly and return things to normal. What is missing in our industry – and within most distributed organizations – is centralized command-and-control to see the big interconnected picture and the potential for cascading failures.

Get Ahead of Future Events Rather Than React to Them

Today’s data center owners and operators need to see the approaching storm, quickly respond to rapidly changing conditions, and address gaps in incident management when changes of state occur within the critical facility. What’s needed are solutions that extend detection and response capabilities by aggregating telemetry and correlating multiple data points in real-time to enable timely and effective incident response.

Next-generation operators (BCS Data Center Operations included) combine people, processes and technology through a centralized, single-source deployment solution that leverage:

Centralized, 7x24x365, eyes-on-glass visibility into critical facilities and physical operations
Trained surveillance and ITIL certified analysts that constantly monitor critical environments, looking for and analyzing change of state data
Purpose-built, computerized maintenance management systems and business intelligence capabilities
An extensive operational playbook to guide real-time incident response actions, communications, root-cause analysis, post-incident action and reporting

With this approach, data center owners and operators can mitigate known (and unknown) operations risks at their data center, with their workloads, and their businesses.

John Hevey is the Vice President, Corporate Technical Service at BCS Data Center Operations.

About the Author

Voices of the Industry

Our Voice of the Industry feature showcases guest articles on thought leadership from sponsors of Data Center Frontier. For more information, see our Voices of the Industry description and guidelines.

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

AI’s Execution Era: Aligned and Netrality on Power, Speed, and the New Data Center Reality

Sponsored

Get in Touch: Conduit Solutions for Data Centers

Sponsored

NECA Manual of Labor Rates Chart

Voices of the Industry

Source: Make more Aerials/Shutterstock.com

Sponsored

The Lift Plan Is Not a Formality

Bill Tierney of Prolift Rigging explains how thoroughly engineered lift plans protect schedule, safety, and execution on hyperscale data center builds.

Sponsored

Why CIOs Need to Reassess Storage Architecture for AI Infrastructure

Ken Claffey, CEO of VDURA, explains why storage systems can and should be designed to deliver consistent availability and throughput in real-world conditions rather than just ...

What Every Risk Officer Should Think About in 2021

Change of State is Predictable Failure – Capture it and Respond Swiftly

Get Ahead of Future Events Rather Than React to Them

About the Author

Voices of the Industry

Related

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

AI’s Execution Era: Aligned and Netrality on Power, Speed, and the New Data Center Reality

Get in Touch: Conduit Solutions for Data Centers

NECA Manual of Labor Rates Chart

Voices of the Industry

The Lift Plan Is Not a Formality

Why CIOs Need to Reassess Storage Architecture for AI Infrastructure

Trending

Fiber's Next Act: How AI Is Driving Connectivity Closer to the Edge

Powering Prosperity: How Santa Clara Turned Data Centers into Civic Infrastructure

Liquid Cooling Market Matures: Innovations, Acquisitions, and Modular Solutions for AI Infrastructure

Sponsored Picks

NECA Manual of Labor Rates Chart

Electrical Conduit Comparison Chart

Choosing the Right Underground Conduit: Types, Benefits, and Applications