Data Centers Feeling the Heat! The History and Future of Data Center Cooling

May 7, 2020
Dror Shenkar, Senior Architect of Intel Data Center Management Solutions, Intel,  and Shahar Belkin, VP R&D at Zuta-Core, explore the past, current and future landscape of data center cooling. What’s next? 

Dror Shenkar, Senior Architect of Intel Data Center Management Solutions, and Shahar Belkin, VP R&D at Zuta-Core, explore the past, current and future landscape of data center cooling. What’s next? 

Dror Shenkar, Senior Architect of Intel Data Center Management Solutions

As we look back on the data center environment of the past 10 to 15 years, the power densities of server racks have remained stable at 3 to 5 kW. During this period, air-cooled data centers using chillers and computer room air conditioning (CRAC) units were sufficient to overcome the heat dissipation from the servers, keeping the facilities, and the CPUs under their roofs, below their maximum temperatures. This was possible as the CPUs did not produce more than 130 W of heat.

Data centers utilized raised floor systems with hot aisles and cold aisles as the main cooling method. Cold air from the CRAC and computer room air handler (CRAH) units was distributed to the space below the raised floor, and then through the perforated tiles it went to the main space in front of the servers. This method was simple and most common for years, and although improved cooling methods have since gradually taken over, it is still employed today.

In recent years, when the rack power densities have trended upward to 10 kW or more, air-cooled configurations developed to hot and cold aisle containment layouts, delivering significant energy savings. The idea behind these methods is to separate the cool server intake air from the heated server exhaust air with a physical barrier and preventing them from mixing. Another method for air-based cooling is in-rack heat extraction. With this method, hot air is being removed by having compressors and chillers built into the rack itself.

In 2018, rack densities continued to increase, approaching 20 kW, and pushing the air-cooled systems to their maximum economic capabilities. As the rack densities continue to grow, and by some estimates, reach as high as 100 kW per rack, direct on chip liquid cooling becomes a viable solution.

Shahar Belkin, VP R&D at Zuta-Core

Data Centers are Feeling the Heat

Artificial Intelligence (AI), gaming, high performance computing, 3D graphics and the Internet of Things (IoT) all demand faster and complex computing services. The rapidly growing cloud services business, along with the growth of the Edge and the competition between providers, is creating a need for efficient use of data center space, and driving the providers to demand more computing cores per square foot. The power consumption of graphics processing units (GPUs) and central processing units (CPUs) has continued to grow even as they produce more heat, from 100 W to 130 W-plus five years ago, to new processors emitting 200 W to 600 W, released to the market in the last two years. In fact, IDC reports that annual energy consumption per server is growing by 9 percent globally, even as growth in performance pushes the demand for energy further upwards.

Air-cooled configurations cope very well with processors that generate up to 130 W of heat, and when stretched to the limit can accommodate processors of 200 W. Above 200 W, processors can be cooled by air using larger size enclosures, wasting, rather than saving rack space. Direct on chip liquid cooling seems to be the solution that can enable the use of high-power processors, keeping the enclosure size small and the density high.

The two most common cooling designs for liquid cooling are direct-to-chip cold plates, or evaporators, and immersion cooling. Direct-to-chip cold plates sit atop the board’s processors to draw off heat. Cold plates are divided into two main groups: single-phase and dual-phase evaporators. Single-phase cold plates mostly use cold water, which is looped into the cold plate to absorb the heat, leaving the server as warm/hot water. With dual-phase evaporators, a safe, low-pressure dielectric liquid flows into the evaporators, the heat generated by the cooled components boils the liquid, and the heat is released from the evaporator as vapor. The heat as hot water or vapor is then transferred into a heat rejection unit that uses a chilled water loop back to the cooling plant, or free air flow to release the heat to the outside world.

Immersion cooling involves a large bath of dielectric fluid submerging the full hardware into the leak-proof bath. The fluid absorbs the heat, and in some cases turns to vapor, cools or condenses, and returns back as fluid to the cooling bath.

Whether the exact cooling method is air-cooled or liquid-based cooling, monitoring server temperatures is a critical part of the cooling system. In all of these cases, granular temperature monitoring of the servers and the servers’ internal components is required in order to ensure healthy and efficient operations.

The Future of Data Center Cooling is Now

There are many innovations from different companies that promise to change the landscape of data center cooling, from using sea or rainwater to reduce precious natural resource usage, to leveraging AI to analyze how data centers are working and adjusting cooling accordingly in real-time, to cooling robots that can monitor the temperature and humidity of the servers in the rack.

When a data center manager overseeing a high density computing environment is provided the necessary data to enable raising the overall set-point temperatures of the room, this capability can significantly lower annual cooling costs across the organization’s entire data center footprint.

But as we look towards the future of the data center and cooling, that future is now. The thermal design of traditional data centers can lead to hot spots and today’s high-density computing environments present even greater liabilities because of the heat produced by continuous processing. If a data center manager lacks visibility into actual device power consumption, this may lead IT staff to overprovision and drive energy usage far beyond the levels needed to maintain safe cooling margins. In fact, Gartner estimates that ongoing power costs are rising at least 10 percent per year due to cost per kilowatt-hour (kwh) increases, especially for high power density servers.

Fortunately, there are data center management solutions that improve data-driven decision-making and enable more precise operational control by providing visibility on power, thermal consumption, server health, and utilization. Using a data center management solution’s cooling analysis function, IT staff can lower cooling costs by safely raising the temperature of the room, thereby improving power usage effectiveness (PUE) and energy efficiency, while continuously monitoring hardware for temperature issues.

When a data center manager overseeing a high density computing environment is provided the necessary data to enable raising the overall set-point temperatures of the room, this capability can significantly lower annual cooling costs across the organization’s entire data center footprint. To relate but one example, a global cybersecurity company was able to raise the temperatures in its server rooms by 3°C, based on the historical temperature readings of each server, making possible a 25% overall savings for the year on cooling.

Data center managers today are faced with multiple, global challenges. These include protecting rapidly expanding volumes of data and a growing number of mission-critical applications, managing any number of remote locations, and implementing increasingly pressing sustainability initiatives, which are precariously balanced again rising energy costs.

To solve these and other challenges, data center management tools not only provide real-time monitoring of the environment with a high degree of data granularity, but these software solutions also provide predictive analysis of thermal data that can identify temperature issues before they can cause critical incidents. Moreover, monitoring and aggregating real-time power and thermal consumption data helps IT staff to analyze and manage data center capacity compared to actual utilization, so that power and cooling infrastructure is used as efficiently as possible.

Dror Shenkar is the  Senior Architect of Intel Data Center Management Solutions at Intel, and Shahar Belkin its the VP of R&D at Zuta-Core

About the Author

Voices of the Industry

Our Voice of the Industry feature showcases guest articles on thought leadership from sponsors of Data Center Frontier. For more information, see our Voices of the Industry description and guidelines.

Sponsored Recommendations

Tackling Utility Project Challenges with Fiberglass Conduit Elbows

Explore how fiberglass conduit elbows tackle utility project challenges like high costs, complex installations, and cable damage. Discover the benefits of durable, cost-efficient...

How Deep Does Electrical Conduit Need to Be Buried?

In industrial and commercial settings conduit burial depth can impact system performance, maintenance requirements, and overall project costs.

Understanding Fiberglass Conduit: A Comprehensive Guide

RTRC (Reinforced Thermosetting Resin Conduit) is an electrical conduit material commonly used by industrial engineers and contractors.

NECA Manual of Labor Rates Chart

See how Champion Fiberglass compares to PVC, GRC and PVC-coated steel in installation.

Cadence Design Systems
Source: Cadence Design Systems

Implementing AI Safely and Sustainably in Data Centers

Mark Fenton, Product Engineering Director at Cadence Design Systems, explains why tools like digital twins will be essential for data centers to meet AI goals sustainably.

White Papers

Dcf A10 Sr Cover 2023 01 17 14 23 57

The Security Gap: DDoS Protection in a Connected World

Jan. 18, 2023
The world is in love with connectivity, but it comes with a whole host of challenges for data centers. As customers continue to shift to the cloud and colocation services, security...