10 KPIs You Need to Monitor to Improve Data Center Health and Efficiency

Oct. 24, 2019
Modern data center managers are under constant pressure to do more with less while simultaneously being tasked with balancing data center uptime and optimizing for efficiency and capacity utilization. Herman Chan, President of Sunbird Software, offers readers 10 KPIs to track if your goal is improving data center health and efficiency. 

Herman Chan, President of Sunbird Software, offers readers 10 KPIs to track if your goal is improving data center health and efficiency. 

Herman Chan, President of Sunbird Software

Modern data center managers are under constant pressure to do more with less while simultaneously being tasked with balancing data center uptime and optimizing for efficiency and capacity utilization. To gauge success and ensure business objectives are met, they are increasingly turning to big data analytics to provide the necessary insights. With networked smart devices such as intelligent rack PDUs, busways, branch circuit meters, and UPSs providing an abundance of power and environment sensor data, it has never been easier to holistically see and analyze this collected data.

But how do you know where to begin, what to track, and what your goals should be? Based on our experience with hundreds of customers participating in our global user groups, we’ve consolidated feedback on what data matters the most and compiled a list of the top 10 Key Performance Indicators (KPIs) that all data center managers should be monitoring to improve the overall health and efficiency of their data centers.

Measuring these KPIs and strategically leveraging the insight provided allows for smarter, more data-driven decision-making across all facets of data center management from asset management to capacity planning to energy efficiency.

  1. Capacity by Key Data Center Resource (Space, Power, Cooling, and Power/Network Port Connections). Data center managers need to make the most informed and data-driven decisions when it comes to reserving space to provision new IT equipment, using power resources more efficiently, saving on operating expenses, and showing management when more capacity is necessary. Therefore, having accurate, real-time information on physical space, power, cooling, and network connectivity capacity is essential to making such decisions. For the most comprehensive view, monitor capacity at the site, room/floor, cabinet, and port levels.
  1. Data Center Energy Cost. IDC reports that energy consumption per server is growing by 9% per year globally as growth in performance pushes demand for energy. The monetary cost of energy consumed can account for up to 50% of total data center operating expenses, and as such needs to be monitored and intelligently reduced. Track your energy consumption and costs by site, department, or applications/services, and set targets for reduction, bill back users, meet corporate sustainability and green initiatives, and collect energy rebates and carbon credits.
  1. Change Requests by User, Stage, and Type. In a typical data center environment, up to 30% of servers get replaced annually. Servers older than five years fail three times more often and cost 200% more to support than a new server. To maintain SLAs while improving efficiency and productivity of data center staff, it is important to simplify the management of moves, adds, and changes for server and network equipment. Data center managers and operators should track the number of change requests, tickets, and work orders, who is making them, what progress is being made, and exactly what type of changes are being requested. By tracking work that happens in the data center from creation to completion, you can ensure work order quality and transparency to business users while improving staff efficiency through improved collaboration.
  1. Available Cabinet and Floor Space Remaining. Intelligent space capacity planning is the key to navigating constant data center expansion and optimization. Track available cabinet space in terms of open rack units, including contiguous rack units, to know how efficient your use of space is and correlate how much space vs. power capacity you have to deploy new devices. You should also track remaining floor space in terms of the number of open cabinet positions to know how much white space is available to deploy new cabinets on the data center floor. Including planned decommissions and future planned deployments in your reporting provides for the most accurate view of actual remaining space capacity.

Measuring these KPIs and strategically leveraging the insight provided allows for smarter, more data-driven decision-making across all facets of data-center management from asset management to capacity planning to energy efficiency.

  1. Cabinets with Most Free Data Ports and Power Ports. To achieve optimal utilization of all cabinet resources, knowing the best place to reserve cabinet space for new equipment also requires knowing which cabinets have available data and power port capacity. Make the most of your physical network and power port capacity by tracking all available ports within your cabinets. Only then, can you identify the optimal cabinets to intelligently provision new servers and network equipment. By knowing which cabinets have the most free physical ports, you can make more informed capacity planning decisions, use power and network resources more efficiently, and reduce operating expenses.
  1. Peak Load Per Cabinet Over Last 30 Days. Data center power resources are increasingly constrained, and managing to uptime competes with driving efficient power utilization. As such, data center managers need a complete view of how much power is being used, how much is available, and where efficiency can be improved. Measuring active power from rack PDU inlet readings and setting warning and critical alert thresholds for cabinet-level loads ensures you are immediately notified before there is an issue and able to react before service is impacted. As an added bonus, not only will you improve uptime, but you will also be able to find cabinets that have stranded power capacity.
  1. Cabinet Power Failover Redundancy Compliance. In a survey by Uptime Institute, 33% of respondents claimed that power outages were the number one cause of downtime and 80% said their most outage was preventable. Data center managers are under increasing pressure to deliver more power to cabinets densely packed with power-hungry hardware. Therefore, it is vital that you have power redundancy in place to ensure that power is always available to IT equipment. To help minimize downtime, use predictive what-if analysis and reporting to test and track your cabinet power failover redundancy with the goal of achieving 100% compliance.
  1. Power Usage Effectiveness (PUE). PUE, a metric developed by the Green Grid Association, is the most commonly used KPI for reporting data center energy efficiency. It is a ratio of the total amount of energy used by a facility to the energy delivered to IT devices. You should target a PUE of less than 1.5, and even 1.2 if you have a newer data center or are moving to a newer colocation facility. If your PUE is exceedingly high, then there is great potential for cost savings by implementing energy efficiency best practices in your data center. Track PUE over time to see the impact resulting from efficiency changes, historically and by seasonality.
  1. Percentage of Cabinets Compliant with ASHRAE Standards. Maximize energy efficiency and ensure optimal environmental conditions for your IT equipment by maintaining your temperature and humidity within the ranges provided by the American Society of Heating, Refrigerating and Air-Conditioning Engineers. Use environmental sensors to identify hot spots, overcooling, and extreme humidity levels by visualizing all sensor points in thermal envelopes within ASHRAE’s psychrometric charts. Then, track the percentage of cabinets in your data center that are compliant with ASHRAE standards with the goal of reaching and maintaining 100% compliance. 
  1. Hot Spots Occurrence and Duration. Hot spots, or locations at the intake of IT equipment where insufficient cooling causes the temperature to exceed the recommended range, pose a threat to equipment and increase outages. Proactively monitor and trend rack inlet temperatures in your data center environment with the aim to minimize the occurrence, size, and duration of all service-impacting hot spots. You can help remediate hot spots by ensuring raised floor tiles are placed properly, using appropriate tile perforation, implementing hot- and cold-aisle containment, positioning racks and CRAC units correctly, and spreading high-density servers throughout the data center.

KPIs Made Easy with DCIM Software

It’s more critical than ever to integrate, analyze, and act on the KPIs that have the most impact on your data center, but how do you begin to monitor so many metrics? With a comprehensive Data Center Infrastructure Management (DCIM) solution, it’s easy.

A modern, second-generation DCIM tool provides the top 10 KPIs, and 100 more, right out-of-the-box with zero-configuration dashboard widgets, reports, and visual analytics. An enterprise-class data and health poller gathers data directly from facility equipment to ensure accurate, high-quality information that leads to deeper, more reliable insights. Second-generation DCIM makes it simple for data center professionals to make smarter, more informed decisions to improve data center health and efficiency while dramatically simplifying capacity management.

Herman Chan is the President of Sunbird Software.

About the Author

Voices of the Industry

Our Voice of the Industry feature showcases guest articles on thought leadership from sponsors of Data Center Frontier. For more information, see our Voices of the Industry description and guidelines.

Sponsored Recommendations

Guide to Environmental Sustainability Metrics for Data Centers

Unlock the power of Environmental, Social, and Governance (ESG) reporting in the data center industry with our comprehensive guide, proposing 28 key metrics across five categories...

The AI Disruption: Challenges and Guidance for Data Center Design

From large training clusters to small edge inference servers, AI is becoming a larger percentage of data center workloads. Learn more.

A better approach to boost data center capacity – Supply capacity agreements

Explore a transformative approach to data center capacity planning with insights on supply capacity agreements, addressing the impact of COVID-19, the AI race, and the evolving...

How Modernizing Aging Data Center Infrastructure Improves Sustainability

Explore the path to improved sustainability in data centers by modernizing aging infrastructure, uncovering challenges, three effective approaches, and specific examples outlined...

iStock, courtesy of AFL

Hyperscale: The AI Tsunami

AFL's Alan Keizer and Keith Sullivan explore how AI is driving change and creating challenges for data centers.

VIAVI Solutions
Image created by DALL-E 3, courtesy of EdgeConnex
Shutterstock, courtesy of BluePrint Supply Chain

White Papers

DCF_IMDCEBook_2020-05-21_10-12-02
DCF_IMDCEBook_2020-05-21_10-12-02
DCF_IMDCEBook_2020-05-21_10-12-02
DCF_IMDCEBook_2020-05-21_10-12-02
DCF_IMDCEBook_2020-05-21_10-12-02

Transforming Your Business Through Data — Creating a Virtuous Cycle for Your High-Value Data

June 1, 2020
Get the new paper from Iron Mountain that explores digital transformation in full, including today’s data landscape, current challenges, opportunities and more.