Sustainability Meets High Density Data Center Cooling

Data centers have been seeing a steady increase in IT equipment power densities over the past 20 years. The past five years have brought a significant rise in power requirements of CPU, GPU, other processors and memory. Managing these heat loads presents a challenge for IT equipment manufacturers, as well as data center designers and operators. This launches our special report series on “Sustainably Meeting High Density Cooling Challenges: When, Where, and How.”

Get the Full Report.

The power density for mainstream off-the-shelf 1U servers with multi-processors now typically ranges from 300 to 500 watts, and some models can reach 1,000 watts. When stacked 40 per cabinet, they can demand 12 to 20 kW. The same is true for racks loaded with multiple blade servers.

Cooling at this power density has already proven nearly impossible for older facilities, and is challenging to some data centers designed and built only five years ago. Even many newer data centers can only accommodate some cabinets at this density level through various workarounds, and have realized that this impacts their cooling energy efficiency.

The demand for more powerful computing for artificial intelligence (AI) and machine learning (ML) will continue to drive power and density levels higher. Processor manufacturers have product roadmaps for CPUs and GPUs expected to exceed 500 watts per processor in the next few years.

The world is trying to mitigate climate change by addressing core sustainability issues. For data centers, energy efficiency is an important element of sustainability; however, energy usage is not the only factor. Today, many data centers use a significant amount of water for cooling.

This paper will examine the issues and potential solutions to efficiently and sustainably support high-density cooling while reducing energy usage and minimizing or eliminating water consumption.

IT Equipment Thermal Management

What is thermal management and how is it different than cooling (free or otherwise)? While it may seem like semantics, there is an important difference between a design approach and a technical approach. Generally speaking, we have traditionally “cooled” the data center by means of so-called “mechanical” cooling. This process requires energy to drive a motor for the mechanical compressor, which drives the system (in reality, it is a “heat pump” since it transfers the heat from one side of the system to another). Getting the heat from the chip to the external heat rejection is the key to end-to-end thermal management effectiveness and energy efficiency.

Traditional mainstream data centers use air-cooled IT equipment (ITE). However, the power density of IT equipment has risen so significantly that it has become more difficult to effectively and efficiently cool IT equipment beyond 20 kW per cabinet using traditional perimeter cooling systems.

While improved airflow management, such as cold or hot aisle containment systems, has helped improve the effectiveness of the IT thermal management within the whitespace (technical space), it still requires a significant amount of fan energy for the facility cooling units and the ITE internal fans. There are also close-coupled cooling systems, such as rear-door heat exchangers and row-based cooling units, which can support higher power densities more effectively.

Air Cooling of Whitespace

Although IT equipment has continuously improved its overall energy efficiency (i.e., power consumed vs. performance), the total power draw has increased tremendously. This has resulted in a rise in average watts per square foot in the whitespace, going from under 100 watts per square foot to 200-300 W/Sf
or even higher for mainstream data centers being designed and built today. While this average power density can be cooled using conditional methods, such as raised floor with perimeter cooling units, it becomes a greater challenge every year.

The bigger challenge starts at the processor level and moves through the heat transfer process within the IT equipment and eventually impacts the rack power density.

The thermal design power (TDP) of processors (CPUs, GPUs, TPUs, and other upcoming devices) and many other devices such as memory have increased significantly over the past decade. Today even the CPUs using air-cooled heat-sinks in low profile commodity and mid-level servers can range 100-150 watts each, but most have difficulty moving up to the 200 W per processor level. This has resulted in significant increase of power density of the individual IT equipment, as well as the power density per rack resulting in an overall rise in watts per square foot in the whitespace.

As noted in the introduction, the power density for mainstream off-the-shelf 1U servers with multi- processors now typically ranges from 300 to 500 watts (some models can reach 1000 watts). When stacked 40 per cabinet, they can demand 12-20 kW. The same is true for racks loaded with multiple blade servers.

Understanding Airflow Physics

The nature of the challenge begins with the basic physics of using air as the medium of heat removal. The traditional cooling unit is designed to operate based on approximately 20°F differential of the air entering the unit and leaving the cooling unit (i.e., delta-t or ∆T). However, modern IT equipment has a highly variable delta-t dependent on its operating conditions as well as its computing load. This means that ∆T may vary from 10°F to 40°F during normal operations. This in itself creates airflow management issues resulting in hotspots for many data centers that are not designed to accommodate this wide range of varying temperature differentials. It also limits the power density per rack. There are solutions using various forms of containment that have been applied to try to minimize or mitigate this issue. Ideally, this is accomplished by providing closer coupling between the IT equipment and the cooling units.

The most common disconnect is not just the delta-t but its companion; the rate of airflow required per kilowatt of heat. This is expressed by the basic formula for airflow (BTU=CFM x 1.08 x ∆T °F), which in effect defines the inverse relationship between ∆T and required airflow for a given unit of heat. For example, it takes 158 CFM at a ∆T of 20°F to transfer one kilowatt of heat. Conversely, it takes twice that amount of airflow (316 CFM) at 10°F ∆T. This is considered a relatively low ∆T, which increases the overall facility fan energy required to cool the rack (increasing PUE). It also increases the IT internal energy (which increases the IT load without any computing work — thus artificially “improving” facility PUE). This also limits the power density per rack.

*Note: For purposes of these examples, we have simplified the issues related to dry-bulb vs. wet-bulb temperatures and latent vs. sensible cooling loads.

To overcome these issues ITE manufacturers of higher density servers, such as blade servers designed to operate at a higher ∆T whenever possible. This allows them to save IT fan energy, but it also can create higher return temperatures for the cooling units. For most cooling units (CRAH) fed from chilled water, this is not an issue; in fact, it is beneficial since it improves the heat transfer to the cooling coil for a given airflow. However, for other types of cooling units, such as a Direct Expansion (DX) Computer Room Air Conditioner (CRAC), which use internal refrigerant compressors, these higher return temperatures can become a problem and stress the compressor beyond the specified maximum return temperatures.

Download the full report, “Sustainably Meeting High Density Cooling Challenges: When, Where, and How,” courtesy of Nautilus Data Technologies to learn more about cooling high density data centers. In our next article, we’ll look at three more IT equipment heat removal challenges.