The Future of Data Center Cooling: From Direct Liquid Cooling to Immersion Cooling

July 29, 2024
Daniel Pope, founder & CTO of Submer, explains that by rethinking the medium used for cooling, data centers can achieve greater reliability, lower energy consumption, and optimized workload performance.

As the demand for high-performance computing continues to rise, data centers are increasingly challenged to manage the heat generated by advanced components. Direct Liquid Cooling (DLC) has emerged as a solution to the immediate problem of cooling new AI chips. However, DLC addresses only part of the issue, leaving other high-density components such as DIMMs, NICs, SSDs, PSUs, and networking equipment like 200/400/800G switches still in need of efficient cooling solutions. These components have increasing Thermal Design Power (TDP), making it crucial to find a comprehensive cooling strategy.

The Limitations of Direct Liquid Cooling

While DLC effectively cools AI chips, it does not eliminate the need for substantial air cooling. This dual requirement means that facility water production plants must still produce water at temperatures of at least 15-17°C. Additionally, servers, rear door heat exchangers, and air handling units with CRAH (Computer Room Air Handler) units still rely heavily on fans. This setup relies on the use of chillers or evaporative cooling to maintain optimal temperatures, which in turn prevents significant reductions in Power Usage Effectiveness (PUE) and Water Usage Effectiveness (WUE).

Complicating the Infrastructure

Rather than simplifying data center infrastructure, the integration of DLC adds complexity. New points of failure are introduced, strict water quality control becomes essential, and the risk of catastrophic leaks increases. Moreover, other critical components for AI and general compute workloads, such as memory, are often neglected in the cooling strategy. This oversight can lead to inefficiencies and potential failures in the data center environment.

Rethinking Data Center Cooling

To address these limitations, it is essential to rethink the medium used for cooling in data centers. Immersion cooling presents a promising solution that can meet the challenges posed by high-density components and increasing TDP. By submerging components in a thermally conductive but electrically insulating liquid, immersion cooling can efficiently dissipate heat from all parts of the system.

The Advantages of Immersion Cooling

Immersion cooling offers several benefits over traditional air and liquid cooling methods:

  1. Comprehensive Cooling: Immersion cooling can handle the heat generated by all components, including AI chips, memory, and networking equipment.
  2. Reduced Complexity: By eliminating the need for fans and separate air and liquid cooling systems, immersion cooling simplifies the data center infrastructure.
  3. Lower PUE: Immersion cooling can significantly reduce and even eliminate the need for chillers, leading to lower PUE and improved energy efficiency.
  4. Enhanced Reliability: With fewer moving parts and a reduced risk of leaks, immersion cooling increases the reliability of data center operations.

Takeaways from Crypto Mining: Optimizing Workloads with Immersion Cooling

One of the most exciting prospects of immersion cooling is the potential to optimize workloads based on available cooling capacity, with dry cooling as the worst-case scenario and overclocking as the best-case scenario. This approach is already implemented at scale in the Bitcoin and cryptocurrency mining industry, which operates in some of the harshest environments on the planet, often where water is scarce.

In the worst-case scenario of a heat wave, where minimal cooling capacity is available, workloads should be managed conservatively to prevent overheating. This includes reducing CPU/GPU clock speeds, spreading tasks across multiple nodes to avoid localized hotspots, and continuously monitoring system temperatures to adjust workloads dynamically and maintain safe operating conditions.

In the best-case scenario where enhanced cooling capability exists, the system can handle higher CPU/GPU clock speeds and more intensive workloads, maximizing performance while ensuring the system remains within safe temperature limits.

Conclusion

Direct Liquid Cooling has provided a short-term solution for integrating new AI chips into data centers, but it falls short of addressing the long-term cooling needs of other high-density components and overall data center efficiency and environmental footprint. The continued reliance on air cooling and chillers complicates infrastructure and limits opportunities for improving PUE. Immersion cooling offers a comprehensive solution that can meet the cooling demands of all components, simplify infrastructure, and enhance data center efficiency. By rethinking the medium used for cooling, data centers can achieve greater reliability, lower energy consumption, and optimized workload performance. The future of data center cooling lies in innovative solutions like immersion cooling, which can take efficiency and workload optimization to the next level.

About the Author

Daniel Pope

Daniel Pope is founder & CTO of Submer. With over 20 years of experience in datacenter design and operations, Daniel is dedicated to driving the transition to sustainable and future-proof digital infrastructure. Connect with Daniel on LinkedIn.

With hundreds of megawatts deployed in production, Submer is on a mission to optimize datacenter operations and lead the way to a greener future. Click here to learn more about Submer’s single-phase immersion cooling technology.

Sponsored Recommendations

Tackling Utility Project Challenges with Fiberglass Conduit Elbows

Explore how fiberglass conduit elbows tackle utility project challenges like high costs, complex installations, and cable damage. Discover the benefits of durable, cost-efficient...

How Deep Does Electrical Conduit Need to Be Buried?

In industrial and commercial settings conduit burial depth can impact system performance, maintenance requirements, and overall project costs.

Understanding Fiberglass Conduit: A Comprehensive Guide

RTRC (Reinforced Thermosetting Resin Conduit) is an electrical conduit material commonly used by industrial engineers and contractors.

NECA Manual of Labor Rates Chart

See how Champion Fiberglass compares to PVC, GRC and PVC-coated steel in installation.