Breaking Barriers in Rack Density: Why Liquid Cooling is the Key to Tomorrow's Data Centers
There is an adage that says, "Everything old is new again." But it only tells part of the story. Because even when the old and new look the same or sound the same, they are inevitably very different. A car company could put fins on the back of a new car model to recall the designs of the 1950s, but everything else would be different. Newer brakes, newer fuel specifications, newer safety features, and completely different construction materials. Looking back can trigger updated ideas in fashion, but there is no going back.
Liquid cooling is not a new concept for data centers. Mainframe data centers in the 1980s used what they commonly referred to as "water cooling." Going back even further, we can see that air and liquid cooling have each taken turns dominating how temperatures were maintained, and each has been replaced by the other as chip efficiencies have changed. More efficient chipsets that can run cooler have sometimes allowed air cooling to take precedence. Then, computing power overtook efficiency, and liquid cooling became the dominant trend.
But this article is not about looking back. Today, the advances we are seeing in computing power and rack density requirements are not just a step forward—they're a massive leap. And for data center providers and customers today, the journey from air cooling to liquid cooling is not at all a matter of reverting to technologies used decades ago, even if some of the terminology is used in common.
Why is liquid cooling such a pervasive topic of conversation among data center designers, builders, operators, and customers of data center services? At a high level, rack densities are growing rapidly as new chips enter the marketplace. Uptime Institute reports that in 2011, densities of 2.4kW were commonplace, and, based on their survey results, the average rose to 8.4kW per rack in 2020, a 3.5x increase. However, it's worth noting that in 2020, 17% of survey respondents reported >20kW per rack, and 5% reported >40kW. So, the numbers were going up steadily.
Looking ahead, we can provide air cooling of CPU racks using row-based air containment solutions like Hot Aisle containment or Cold Aisle Containment reaching 25-40kW, which is a substantial increase, but GPU racks—even air-cooled current GPU racks need to be cooling using rear-door heat exchangers—can require over 40kW. That number will grow in the coming months to over 100kW, with the advent of NVIDIA’s Grace Blackwell chip requiring Direct to Chip liquid cooling techniques. In just a couple of years, NVIDIA’s GPUs are projected to grow to 250-300kW per rack equivalent.
As an example, EdgeConneX Ingenuity is a next-gen data center solution designed for demanding cloud and AI/HPC customers, enabling HPC, AI Training, and AI Inferencing workloads essential for the AI shift. With flexible designs that support 300+ kW/ rack, Ingenuity enables dedicated and mixed AI/HPC workloads within the same data center.
It is clear that today's liquid cooling is different from past iterations. It is also clear that, for many customers and providers, there is urgency attached to the transition from air-cooled to liquid cooling. New chips are coming to market faster. They offer astonishing computing power for new AI based workloads. And they are paving the way for applications and services that were the stuff of science fiction just a decade or two ago.
It is imperative that customers understand how a new, liquid-cooled data center will differ from a more traditional air-cooled facility and how the new generations of chips will accelerate the evolution in cooling requirements.
Today's journey from air-cooled to liquid-cooled data centers requires expertise, proven experience, and a keen understanding of how this transition impacts almost every aspect of data center design, construction, and operations in markets that may be remote or located in proximity to major population centers.
There will not be a universal path for businesses migrating to chips and servers needing liquid cooling for their upcoming deployments. For example, in some cases, a customer may want a data center that accommodates both air and liquid options, so most data center customers will benefit from the right partnership with a provider who brings proven experience to each project in markets around the globe and can help them navigate the complex air to liquid journey.
Let's look at just some of the tasks and metrics that need to be tracked as we move from CPUs to air-cooled GPUs to the superpowered chips slated for release in just the next couple of years:
- Construction standards that can support heavier weights to accommodate new equipment and liquid supply
- Local regulations for handling and disposing of cooling fluids
- Thermal Storage, which is critical to ensure GPU expensive chips can be cooled if power is lost to the data center
- Cooling Distribution Units, that supply coolant to the servers and ensure particulates are filtered out of the cooling fluids
- Leak Detection and Containment, critical to protect expensive GPU chips
- New DCIM and telemetry requirements for tracking data and metrics
- Remote management tools
- New implementation and operational procedures need to be crafted
- Working with chip manufacturers to future-proof new data centers so they can accommodate planned specifications
Obviously, there is a lot more that needs to be done. Still, this list gives us a sense of how important it will be for data center providers to fully engage with their customers to understand their near-term needs, their longer-term plans, and how to plot their journey to tomorrow's data centers. Providers will need to be flexible enough to respond to shifting customer requirements and adapt to the product maps developed by hardware manufacturers, from chips to cooling units.
As for the notion that everything old is new again, the most prepared and experienced data center providers aren't looking back. They're focused on the future and preparing to help lead their customers on an exciting new journey.
At EdgeConneX, we are operationally ready to help our customers navigate this exciting and evolving landscape on their air-to-liquid journey.
Phillip Marangella
Phillip Marangella is Chief Marketing and Product Officer for EdgeConneX. EdgeConneX, is a global data center provider focused on driving innovation. Contact EdgeConneX to learn more about their 100% customer-defined data center and infrastructure solutions.