Getting the Most Reliability from Your Assets with Remote Monitoring
As a data center operator, the success of your business model is dependent on your uptime and SLA compliance – which are highly affected by your assets’ reliability and longevity. There is a direct correlation between reliability and your ROI.
Power distribution and cooling gear (Generators, PDUs, UPSs, CRACs, CRAHs) for data centers are costly to purchase and maintain, but downtime and lost contracts are far more expensive
To maximize availably and SLA compliance, two of the top factors (above vendor quality) are the environmental conditions in which they operate, and the quality of power being provided.
ASHRAE, ITIC, and other organizations have long proven the direct relationship between power quality and operating
environment on hardware uptime and reliability. They have produced many standards to define the challenges and goals of environmental and power conditions. This awareness has led all major hyperscalers to include environmental and power compliance as part of their colocation contracts.
How Power Affects Your Infrastructure Gear
Power quality directly affects the life of your hardware. Not just significant power events, but ongoing variations in the power quality that accumulate as degraded reliability in your infrastructure and IT gear. The many small sags, surges, and spikes slowly erode the quality of the components in an electrical device. This effect has been well researched and documented by multiple organizations, who have produced reports and created standards that define the optimum operating conditions to maximize equipment life. The most prevalent industry standard is provided by ITIC (Information Technology Industry Council). They provide and maintain the ITIC curve (shown below), which allows for measurement of damage/risk to a device for individual power events.
How Environmental Conditions Affect Your Infrastructure Gear
Humidity, air pressure, air quality and temperature combine to affect condensation and oxidation on surfaces and on airflow, which can affect cooling efficiency.
High humidity can lead to condensation and moisture buildup inside IT equipment, causing electrical short circuits and corrosion. Low humidity can lead to static electricity buildup, causing electrostatic discharge events that can damage electronic components.
Dust and other particulate matter can accumulate inside IT equipment, blocking airflow and causing overheating. It can also interfere with moving parts, such as fans and drives, reducing their efficiency and potentially leading to equipment failure.
Excessive heat can cause IT equipment to overheat, leading to reduced performance and potentially causing hardware failures. Extremely cold conditions can also affect IT equipment, causing it to become less efficient and possibly leading to condensation and moisture-related problems. Most IT equipment has recommended operating temperature ranges, and it’s important to ensure that the operating environment stays within these ranges.
The above factors can be controlled by climate control systems (HVAC) to maintain proper temperature and humidity levels, and by dust filters and clean rooms to reduce particulate contamination.
ASHRAE (The American Society of Heating, Refrigerating and Air-Conditioning Engineers) has provided research results from studies into the impact of environmental conditions on hardware. One example is provided here:
From this research they have provided a recommended standard (Equipment Thermal Guidelines for Data Processing Environments) specifically for data center operations. This specifies optimal environmental conditions to maximize equipment reliability and life.
The ASHRAE standard for thermal guidelines in the data center can be found in the 2016 ASHRAE white paper TC9.9 Data Center Power Equipment Thermal Guidelines and Best Practices. They were last updated in 2021. Here is a link to a PDF for a reference card on the 2021 Equipment Thermal Guidelines for Data Processing Environments.
Visibility of Your Power Quality and Impact
How do you know if you are being affected by these factors – is your critical infrastructure gear slowly degrading in reliability, or is an increase in downtime looming ahead?
A DCIM (Data Center Infrastructure Management) solution can help monitor power conditions on your gear and provide easier access to power events and waveforms logged by PQMs (Power Quality Meters). General monitoring of individual device report points (e.g., voltage, current, etc.) allows you to track and trend different aspects of your power quality and detect when thresholds have been exceeded:
Power Quality Meter monitoring: high resolution power monitoring with event capture.
Event capture: seeing very short duration events that power gear like UPS, PDUs, RPPs generally cannot detect or report. PQM (Power Quality Meters) has high resolution power monitoring and when an event occurs, saves a snapshot around the moment of the event so it can be accessed after the event – often down to sub millisecond level – the 63rd harmonic for advanced PQMs.
Visibility of Your Environmental Conditions and Impact
Environmental monitoring in data centers is crucial to ensure the efficient and reliable operation of IT equipment while protecting against environmental factors that can lead to downtime or hardware failures. Usually this is accomplished with sensors and detectors:
- Temperature/humidity sensors ensure your HVAC systems are allowing you to operate within the ASHRAE guidelines or within Service Level Agreement (SLA) guidelines imposed by your
- If you have sufficient sensor density, you can create heat maps of your data center to correct hot and cold spots and improve cooling Monitoring the supply and return temperatures of your cooling equipment helps ensure you are not overcooling your data center and wasting energy.
- Airflow sensors monitor the air movement in and around They ensure that hot and cold aisles are well-structured.
- Water leak detectors alert you to leaks or flooding, helping to prevent damage to hardware and electrical systems.
How to Minimize the Risk
These factors all contribute to a risk of reduced reliability and downtown for your critical infrastructure gear. The risk and impact have been proven through thorough research with documented data provided by independent agencies. Tracking these risk factors should be part of your ongoing monitoring and data collection of your hardware.
A powerful tool for managing these risks is a full-feature DCIM solution capable of tracking these conditions, providing real-time alarms, as well as analysis of historical data. The visibility of these factors can be challenging, but a quality DCIM solution provides this visibility in addition to meeting your basic monitoring and alerting needs.
A DCIM solution that can track these aspects of your hardware also provides core data, and advanced views like thermal distribution maps and psychrometric charts, which will allow you to maximize the efficiency of your power distribution and cooling infrastructure.
Max Hamner
Max Hamner is Research and Development Engineer at Modius. Contact Modius to learn more about its DCIM solutions. Modius OpenData provides integrated tools including machine learning capability to manage the assets and performance of colocation facilities, enterprise data centers, and critical infrastructure. OpenData is a ready-to-deploy DCIM featuring an enterprise-class architecture that scales incredibly well. In addition, OpenData gives you real-time, normalized, actionable data accessible through a single sign-on and a single pane of glass.
Delivering DCIM solutions since 2007, Modius is passionate about helping clients run more profitable data centers and providing operators with the best possible view into a managed facility’s data. Modius is based in San Francisco, California, and is proudly a Veteran Owned Small Business (VOSB Certified).
Ray Daugherty
Ray Daugherty is Senior Services Consultant at Modius.
Modius, a San Francisco-based Veteran Owned Small Business (VOSB Certified), is a premier provider of end-to-end solutions for managing the availability, capacity, and efficiency of critical facilities, including data centers, smart buildings, telecommunications, and IoT environments. Our flagship product, OpenData, offers a comprehensive suite of tools for managing the performance of mission-critical infrastructure, from device integration to analytics and dashboards.
Schedule a personalized demo today to discover how OpenData can unlock the full potential of your infrastructure. Connect with our team at 1-888-323-0066 or email us at [email protected] to learn more.