Getting the Most Reliability from Your Assets with Remote Monitoring

Feb. 7, 2024
Max Hamner and Ray Daugherty of Modius explain why hyperscalers should include environmental and power compliance as part of their colocation contracts.

As a data center operator, the success of your business model is dependent on your uptime and SLA compliance – which are highly affected by your assets’ reliability and longevity. There is a direct correlation between reliability and your ROI.

Power distribution and cooling gear (Generators, PDUs, UPSs, CRACs, CRAHs) for data centers are costly to purchase and maintain, but downtime and lost contracts are far more expensive

To maximize availably and SLA compliance, two of the top factors (above vendor quality) are the environmental conditions in which they operate, and the quality of power being provided.

ASHRAE, ITIC, and other organizations have long proven the direct relationship between power quality and operating

environment on hardware uptime and reliability. They have produced many standards to define the challenges and goals of environmental and power conditions. This awareness has led all major hyperscalers to include environmental and power compliance as part of their colocation contracts.

How Power Affects Your Infrastructure Gear

Power quality directly affects the life of your hardware. Not just significant power events, but ongoing variations in the power quality that accumulate as degraded reliability in your infrastructure and IT gear. The many small sags, surges, and spikes slowly erode the quality of the components in an electrical device. This effect has been well researched and documented by multiple organizations, who have produced reports and created standards that define the optimum operating conditions to maximize equipment life. The most prevalent industry standard is provided by ITIC (Information Technology Industry Council). They provide and maintain the ITIC curve (shown below), which allows for measurement of damage/risk to a device for individual power events.

How Environmental Conditions Affect Your Infrastructure Gear

Humidity, air pressure, air quality and temperature combine to affect condensation and oxidation on surfaces and on airflow, which can affect cooling efficiency.

High humidity can lead to condensation and moisture buildup inside IT equipment, causing electrical short circuits and corrosion. Low humidity can lead to static electricity buildup, causing electrostatic discharge events that can damage electronic components.

Dust and other particulate matter can accumulate inside IT equipment, blocking airflow and causing overheating. It can also interfere with moving parts, such as fans and drives, reducing their efficiency and potentially leading to equipment failure.

Excessive heat can cause IT equipment to overheat, leading to reduced performance and potentially causing hardware failures. Extremely cold conditions can also affect IT equipment, causing it to become less efficient and possibly leading to condensation and moisture-related problems. Most IT equipment has recommended operating temperature ranges, and it’s important to ensure that the operating environment stays within these ranges.

The above factors can be controlled by climate control systems (HVAC) to maintain proper temperature and humidity levels, and by dust filters and clean rooms to reduce particulate contamination.

ASHRAE (The American Society of Heating, Refrigerating and Air-Conditioning Engineers) has provided research results from studies into the impact of environmental conditions on hardware. One example is provided here:

From this research they have provided a recommended standard (Equipment Thermal Guidelines for Data Processing Environments) specifically for data center operations. This specifies optimal environmental conditions to maximize equipment reliability and life.

The ASHRAE standard for thermal guidelines in the data center can be found in the 2016 ASHRAE white paper TC9.9 Data Center Power Equipment Thermal Guidelines and Best Practices. They were last updated in 2021. Here is a link to a PDF for a reference card on the 2021 Equipment Thermal Guidelines for Data Processing Environments.

Visibility of Your Power Quality and Impact

How do you know if you are being affected by these factors – is your critical infrastructure gear slowly degrading in reliability, or is an increase in downtime looming ahead?

A DCIM (Data Center Infrastructure Management) solution can help monitor power conditions on your gear and provide easier access to power events and waveforms logged by PQMs (Power Quality Meters). General monitoring of individual device report points (e.g., voltage, current, etc.) allows you to track and trend different aspects of your power quality and detect when thresholds have been exceeded:

Power Quality Meter monitoring: high resolution power monitoring with event capture.

Event capture: seeing very short duration events that power gear like UPS, PDUs, RPPs generally cannot detect or report. PQM (Power Quality Meters) has high resolution power monitoring and when an event occurs, saves a snapshot around the moment of the event so it can be accessed after the event – often down to sub millisecond level – the 63rd harmonic for advanced PQMs.

Visibility of Your Environmental Conditions and Impact

Environmental monitoring in data centers is crucial to ensure the efficient and reliable operation of IT equipment while protecting against environmental factors that can lead to downtime or hardware failures. Usually this is accomplished with sensors and detectors:

  • Temperature/humidity sensors ensure your HVAC systems are allowing you to operate within the ASHRAE guidelines or within Service Level Agreement (SLA) guidelines imposed by your
  • If you have sufficient sensor density, you can create heat maps of your data center to correct hot and cold spots and improve cooling Monitoring the supply and return temperatures of your cooling equipment helps ensure you are not overcooling your data center and wasting energy.
  • Airflow sensors monitor the air movement in and around They ensure that hot and cold aisles are well-structured.
  • Water leak detectors alert you to leaks or flooding, helping to prevent damage to hardware and electrical systems.

How to Minimize the Risk

These factors all contribute to a risk of reduced reliability and downtown for your critical infrastructure gear. The risk and impact have been proven through thorough research with documented data provided by independent agencies. Tracking these risk factors should be part of your ongoing monitoring and data collection of your hardware.

A powerful tool for managing these risks is a full-feature DCIM solution capable of tracking these conditions, providing real-time alarms, as well as analysis of historical data. The visibility of these factors can be challenging, but a quality DCIM solution provides this visibility in addition to meeting your basic monitoring and alerting needs.

A DCIM solution that can track these aspects of your hardware also provides core data, and advanced views like thermal distribution maps and psychrometric charts, which will allow you to maximize the efficiency of your power distribution and cooling infrastructure.

About the Author

Max Hamner

Max Hamner is Research and Development Engineer at Modius. Contact Modius to learn more about its DCIM solutions. Modius OpenData provides integrated tools including machine learning capability to manage the assets and performance of colocation facilities, enterprise data centers, and critical infrastructure. OpenData is a ready-to-deploy DCIM featuring an enterprise-class architecture that scales incredibly well. In addition, OpenData gives you real-time, normalized, actionable data accessible through a single sign-on and a single pane of glass.

Delivering DCIM solutions since 2017, Modius is passionate about helping clients run more profitable data centers and providing operators with the best possible view into a managed facility’s data. Modius is based in San Francisco, California, and is proudly a Veteran Owned Small Business (VOSB Certified).

About the Author

Ray Daugherty

Ray Daugherty is Senior Services Consultant at Modius. Contact Modius to learn more about its DCIM solutions and other critical infrastructure management software that optimize the infrastructure and operations of critical facilities, including data centers, telecom, smart buildings and other IoT environments.

Sponsored Recommendations

Guide to Environmental Sustainability Metrics for Data Centers

Unlock the power of Environmental, Social, and Governance (ESG) reporting in the data center industry with our comprehensive guide, proposing 28 key metrics across five categories...

The AI Disruption: Challenges and Guidance for Data Center Design

From large training clusters to small edge inference servers, AI is becoming a larger percentage of data center workloads. Learn more.

A better approach to boost data center capacity – Supply capacity agreements

Explore a transformative approach to data center capacity planning with insights on supply capacity agreements, addressing the impact of COVID-19, the AI race, and the evolving...

How Modernizing Aging Data Center Infrastructure Improves Sustainability

Explore the path to improved sustainability in data centers by modernizing aging infrastructure, uncovering challenges, three effective approaches, and specific examples outlined...

Cadence Design Systems

Are You Ready for the Storm About to Hit the Data Center Industry?

Mark Fenton, Senior Product Engineering Manager at Cadence Design Systems, explains how a campus-wide digital twin for data centers is one way to revolutionize performance and...

White Papers

Dcf Cadence No Capacity Wp Cover 2023 01 11 17 29 43

No Capacity for Change

Jan. 11, 2023
The pace of digital transformation is accelerating bringing not only opportunities, but challenges for technical professionals and digital strategists. Notably, organizations ...