Gamers have long used the combination of overclocking CPUs and water cooling to squeeze peak performance out of PCs. Can that same approach create more powerful cloud computing platforms?
New research from Microsoft suggests that it might. The company has been test-driving the use of overclocked processors running in immersion cooling tanks, and says the combination allows servers to perform at a higher level.
“Based on our tests, we’ve found that for some chipsets, the performance can increase by 20 percent through the use of liquid cooling,” said Christian Belady, distinguished engineer and vice president of Microsoft’s datacenter advanced development group. “This demonstrates how liquid cooling can be used not only to support our sustainability goals to reduce and eventually eliminate water used for cooling in datacenters, but also generate more performant chips operating at warmer coolant temperatures for advanced AI and ML workloads.”
Those types of performance improvements could be significant when applied at cloud scale, and enable new approaches to how data centers are built and operated.
“Because of the efficiencies in both power and cooling that liquid cooling affords us, it unlocks new potential for data center rack design,” said Belady.
The research was shared as part of Microsoft’s update on its sustainability initiatives, including a plan to slash its data center water usage by 95 percent by 2024. Increased use of liquid cooling is an important component of this effort, as it allows a waterless design. Microsoft will also begin running warmer data centers, raising the set point for server halls to reduce – and possibly eliminate – the company’s reliance on water-intensive evaporative cooling.
Overclocking: More Power, But Also More Heat
In overclocking, CPUs are run at a higher clock rate than designed – in effect, forcing the computer to run faster than it’s supposed to go. Overclocking boosts performance, but also causes components to generate more heat.
“Liquid cooling can be used not only to support our sustainability goals, but also generate more performant chips operating at warmer coolant temperatures for advanced AI and ML workloads.”
Christian Belady, Microsoft
That’s where immersion cooling comes in. In two-phase immersion, servers are submerged in a coolant fluid that boils off as the chips generate heat, removing the heat as it changes from liquid to vapor. The vapor then condenses into liquid for reuse, all without a pump. Immersion cooling can deliver exceptional power efficiency because it uses sealed tanks that don’t require the raised floors or room-level air cooling found in most commercial data centers.
We’ve been tracking Microsoft’s adoption of liquid cooling, and the company is signaling that two-phase immersion cooling will play a much larger role in future data designs.
“Liquid cooling paves the way for more densely-packed servers in smaller spaces, meaning increased capacity per square foot in a datacenter – or the ability to create smaller datacenters in more strategic locations in the future, Belady said. “This adds to the benefits of waterless cooling design.”
In March, Microsoft revealed that it was test-driving cooling technology used in bitcoin mining facilities in which servers are dunked in tanks of cooling fluid to manage rising heat densities. In April it placed a single rack of 48 servers using two-phase immersion cooling into production in its data center in Quincy, Washington.
More Immersion Cooling on the Horizon?
Microsoft isn’t yet building entire data centers of immersion tanks, but says it is ready to expand their use.
“Our plan is to scale this deployment to multiple tanks to understand how to scale liquid immersion cooling and maintain services reliability,” the company said. “Depending on the outcome, we are going to develop a more optimized version of this technology across other datacenters.”
Microsoft isn’t the first company to connect immersion and performance gains, as this has been a driver in the adoption of immersion in cryptocurrency mining. Riot Blockchain says its research found that using immersion with ASIC chips can boost its hash rate by 25 percent, with the potential for larger gains. This week Riot began construction on 200 megawatts of immersion cooling capacity at a new hashing center in Rockdale, Texas.
“We anticipate observing an increase in the company’s hash rate and productivity through 2022, without having to rely solely on purchasing additional ASICs,” said Jason Lee, CEO of Riot.
Microsoft’s Azure cloud and Office apps require much higher reliability than bitcoin mining, which can be interrupted without knocking global businesses offline. A key motivation for Microsoft is beefing up its cloud for growing adoption of artificial intelligence (AI) and other high-density applications, which pose challenges for data center design and management.
Powerful new hardware for AI workloads is packing more computing power into each piece of equipment, boosting the power density – the amount of electricity used by servers and storage in a rack or cabinet – and the accompanying heat.
AI hardware also can create high “flux,” in which power use in a rack increases rapidly as hardware commences a new workload, which can be difficult to manage with traditional air cooling. A number of service providers have focused on air-cooled solutions optimized for high-density workloads, but as densities rise past 25 to 30 kW a rack, users increasingly turn to liquid cooling to manage these workloads.
Belady says liquid cooling represents “a major step function for managing density.”
“It paves the way for higher density and more power-efficient data centers,” said Belady. “We’re only at the beginning of that density curve. We’re really bullish on the technology.”