Data Centers Feeling the Heat! The History and Future of Data Center Cooling
Dror Shenkar, Senior Architect of Intel Data Center Management Solutions, and Shahar Belkin, VP R&D at Zuta-Core, explore the past, current and future landscape of data center cooling. What’s next?
As we look back on the data center environment of the past 10 to 15 years, the power densities of server racks have remained stable at 3 to 5 kW. During this period, air-cooled data centers using chillers and computer room air conditioning (CRAC) units were sufficient to overcome the heat dissipation from the servers, keeping the facilities, and the CPUs under their roofs, below their maximum temperatures. This was possible as the CPUs did not produce more than 130 W of heat.
Data centers utilized raised floor systems with hot aisles and cold aisles as the main cooling method. Cold air from the CRAC and computer room air handler (CRAH) units was distributed to the space below the raised floor, and then through the perforated tiles it went to the main space in front of the servers. This method was simple and most common for years, and although improved cooling methods have since gradually taken over, it is still employed today.
In recent years, when the rack power densities have trended upward to 10 kW or more, air-cooled configurations developed to hot and cold aisle containment layouts, delivering significant energy savings. The idea behind these methods is to separate the cool server intake air from the heated server exhaust air with a physical barrier and preventing them from mixing. Another method for air-based cooling is in-rack heat extraction. With this method, hot air is being removed by having compressors and chillers built into the rack itself.
In 2018, rack densities continued to increase, approaching 20 kW, and pushing the air-cooled systems to their maximum economic capabilities. As the rack densities continue to grow, and by some estimates, reach as high as 100 kW per rack, direct on chip liquid cooling becomes a viable solution.
Data Centers are Feeling the Heat
Artificial Intelligence (AI), gaming, high performance computing, 3D graphics and the Internet of Things (IoT) all demand faster and complex computing services. The rapidly growing cloud services business, along with the growth of the Edge and the competition between providers, is creating a need for efficient use of data center space, and driving the providers to demand more computing cores per square foot. The power consumption of graphics processing units (GPUs) and central processing units (CPUs) has continued to grow even as they produce more heat, from 100 W to 130 W-plus five years ago, to new processors emitting 200 W to 600 W, released to the market in the last two years. In fact, IDC reports that annual energy consumption per server is growing by 9 percent globally, even as growth in performance pushes the demand for energy further upwards.
Air-cooled configurations cope very well with processors that generate up to 130 W of heat, and when stretched to the limit can accommodate processors of 200 W. Above 200 W, processors can be cooled by air using larger size enclosures, wasting, rather than saving rack space. Direct on chip liquid cooling seems to be the solution that can enable the use of high-power processors, keeping the enclosure size small and the density high.
The two most common cooling designs for liquid cooling are direct-to-chip cold plates, or evaporators, and immersion cooling. Direct-to-chip cold plates sit atop the board’s processors to draw off heat. Cold plates are divided into two main groups: single-phase and dual-phase evaporators. Single-phase cold plates mostly use cold water, which is looped into the cold plate to absorb the heat, leaving the server as warm/hot water. With dual-phase evaporators, a safe, low-pressure dielectric liquid flows into the evaporators, the heat generated by the cooled components boils the liquid, and the heat is released from the evaporator as vapor. The heat as hot water or vapor is then transferred into a heat rejection unit that uses a chilled water loop back to the cooling plant, or free air flow to release the heat to the outside world.
Immersion cooling involves a large bath of dielectric fluid submerging the full hardware into the leak-proof bath. The fluid absorbs the heat, and in some cases turns to vapor, cools or condenses, and returns back as fluid to the cooling bath.
Whether the exact cooling method is air-cooled or liquid-based cooling, monitoring server temperatures is a critical part of the cooling system. In all of these cases, granular temperature monitoring of the servers and the servers’ internal components is required in order to ensure healthy and efficient operations.
The Future of Data Center Cooling is Now
There are many innovations from different companies that promise to change the landscape of data center cooling, from using sea or rainwater to reduce precious natural resource usage, to leveraging AI to analyze how data centers are working and adjusting cooling accordingly in real-time, to cooling robots that can monitor the temperature and humidity of the servers in the rack.
When a data center manager overseeing a high density computing environment is provided the necessary data to enable raising the overall set-point temperatures of the room, this capability can significantly lower annual cooling costs across the organization’s entire data center footprint.
But as we look towards the future of the data center and cooling, that future is now. The thermal design of traditional data centers can lead to hot spots and today’s high-density computing environments present even greater liabilities because of the heat produced by continuous processing. If a data center manager lacks visibility into actual device power consumption, this may lead IT staff to overprovision and drive energy usage far beyond the levels needed to maintain safe cooling margins. In fact, Gartner estimates that ongoing power costs are rising at least 10 percent per year due to cost per kilowatt-hour (kwh) increases, especially for high power density servers.
Fortunately, there are data center management solutions that improve data-driven decision-making and enable more precise operational control by providing visibility on power, thermal consumption, server health, and utilization. Using a data center management solution’s cooling analysis function, IT staff can lower cooling costs by safely raising the temperature of the room, thereby improving power usage effectiveness (PUE) and energy efficiency, while continuously monitoring hardware for temperature issues.
When a data center manager overseeing a high density computing environment is provided the necessary data to enable raising the overall set-point temperatures of the room, this capability can significantly lower annual cooling costs across the organization’s entire data center footprint. To relate but one example, a global cybersecurity company was able to raise the temperatures in its server rooms by 3°C, based on the historical temperature readings of each server, making possible a 25% overall savings for the year on cooling.
Data center managers today are faced with multiple, global challenges. These include protecting rapidly expanding volumes of data and a growing number of mission-critical applications, managing any number of remote locations, and implementing increasingly pressing sustainability initiatives, which are precariously balanced again rising energy costs.
To solve these and other challenges, data center management tools not only provide real-time monitoring of the environment with a high degree of data granularity, but these software solutions also provide predictive analysis of thermal data that can identify temperature issues before they can cause critical incidents. Moreover, monitoring and aggregating real-time power and thermal consumption data helps IT staff to analyze and manage data center capacity compared to actual utilization, so that power and cooling infrastructure is used as efficiently as possible.
Dror Shenkar is the Senior Architect of Intel Data Center Management Solutions at Intel, and Shahar Belkin its the VP of R&D at Zuta-Core.