The Future of Data Center Cooling: From Direct Liquid Cooling to Immersion Cooling
As the demand for high-performance computing continues to rise, data centers are increasingly challenged to manage the heat generated by advanced components. Direct Liquid Cooling (DLC) has emerged as a solution to the immediate problem of cooling new AI chips. However, DLC addresses only part of the issue, leaving other high-density components such as DIMMs, NICs, SSDs, PSUs, and networking equipment like 200/400/800G switches still in need of efficient cooling solutions. These components have increasing Thermal Design Power (TDP), making it crucial to find a comprehensive cooling strategy.
The Limitations of Direct Liquid Cooling
While DLC effectively cools AI chips, it does not eliminate the need for substantial air cooling. This dual requirement means that facility water production plants must still produce water at temperatures of at least 15-17°C. Additionally, servers, rear door heat exchangers, and air handling units with CRAH (Computer Room Air Handler) units still rely heavily on fans. This setup relies on the use of chillers or evaporative cooling to maintain optimal temperatures, which in turn prevents significant reductions in Power Usage Effectiveness (PUE) and Water Usage Effectiveness (WUE).
Complicating the Infrastructure
Rather than simplifying data center infrastructure, the integration of DLC adds complexity. New points of failure are introduced, strict water quality control becomes essential, and the risk of catastrophic leaks increases. Moreover, other critical components for AI and general compute workloads, such as memory, are often neglected in the cooling strategy. This oversight can lead to inefficiencies and potential failures in the data center environment.
Rethinking Data Center Cooling
To address these limitations, it is essential to rethink the medium used for cooling in data centers. Immersion cooling presents a promising solution that can meet the challenges posed by high-density components and increasing TDP. By submerging components in a thermally conductive but electrically insulating liquid, immersion cooling can efficiently dissipate heat from all parts of the system.
The Advantages of Immersion Cooling
Immersion cooling offers several benefits over traditional air and liquid cooling methods:
- Comprehensive Cooling: Immersion cooling can handle the heat generated by all components, including AI chips, memory, and networking equipment.
- Reduced Complexity: By eliminating the need for fans and separate air and liquid cooling systems, immersion cooling simplifies the data center infrastructure.
- Lower PUE: Immersion cooling can significantly reduce and even eliminate the need for chillers, leading to lower PUE and improved energy efficiency.
- Enhanced Reliability: With fewer moving parts and a reduced risk of leaks, immersion cooling increases the reliability of data center operations.
Takeaways from Crypto Mining: Optimizing Workloads with Immersion Cooling
One of the most exciting prospects of immersion cooling is the potential to optimize workloads based on available cooling capacity, with dry cooling as the worst-case scenario and overclocking as the best-case scenario. This approach is already implemented at scale in the Bitcoin and cryptocurrency mining industry, which operates in some of the harshest environments on the planet, often where water is scarce.
In the worst-case scenario of a heat wave, where minimal cooling capacity is available, workloads should be managed conservatively to prevent overheating. This includes reducing CPU/GPU clock speeds, spreading tasks across multiple nodes to avoid localized hotspots, and continuously monitoring system temperatures to adjust workloads dynamically and maintain safe operating conditions.
In the best-case scenario where enhanced cooling capability exists, the system can handle higher CPU/GPU clock speeds and more intensive workloads, maximizing performance while ensuring the system remains within safe temperature limits.
Conclusion
Direct Liquid Cooling has provided a short-term solution for integrating new AI chips into data centers, but it falls short of addressing the long-term cooling needs of other high-density components and overall data center efficiency and environmental footprint. The continued reliance on air cooling and chillers complicates infrastructure and limits opportunities for improving PUE. Immersion cooling offers a comprehensive solution that can meet the cooling demands of all components, simplify infrastructure, and enhance data center efficiency. By rethinking the medium used for cooling, data centers can achieve greater reliability, lower energy consumption, and optimized workload performance. The future of data center cooling lies in innovative solutions like immersion cooling, which can take efficiency and workload optimization to the next level.
Daniel Pope
Daniel Pope is founder & CTO of Submer. With over 20 years of experience in datacenter design and operations, Daniel is dedicated to driving the transition to sustainable and future-proof digital infrastructure. Connect with Daniel on LinkedIn.
With hundreds of megawatts deployed in production, Submer is on a mission to optimize datacenter operations and lead the way to a greener future. Click here to learn more about Submer’s single-phase immersion cooling technology.