Data center cooling: Machine learning is the problem and the solution

Feb. 26, 2019
Clearly, machine learning can and has been used to achieve greater data center cooling efficiency. While most data centers are not yet equipped to do the same, the theory behind how machine learning can optimize cooling efficiency is fairly well understood. Sabey Data Centers’ John Sasser,  Sr. Vice President of Data Center Operations, takes a look at how machine learning presents both challenges and opportunities for today’s data center cooling technology. 

Sabey Data Centers’ John Sasser,  Sr. Vice President of Data Center Operations, takes a look at how machine learning presents both challenges and opportunities for today’s data center cooling technology. 

John Sasser,  Sr. Vice President of Data Center Operations, Sabey Data Centers

It’s no secret that rack kW is steadily increasing in the data center, nor is it any wonder why. Processing power is greater than ever, and there’s only one direction for it to go: up.

However, the massive, sustained computational power required by machine learning workloads is anything but business as usual. Most data center operators can grapple with gradual increases in IT footprint, but high-density GPU clusters for machine learning raise the stakes, particularly where cooling is concerned.

Perhaps newer data centers, especially those using containment strategies, have the infrastructure to adequately cool what, in some cases, amounts to 30 kW per rack or more. Most older data centers, though, aren’t ready to sustain these requirements. This could prove problematic as artificial intelligence, machine learning and deep learning workloads become more commonplace.

Indeed, some colocation providers that operate older raised floor data centers without hot aisle containment, already serve customers that want to load up their cabinets but lack the ability to cool their desired densities. But the next wave of customers will have an even bigger ask: cooling infrastructure that can support machine learning workloads.

How can this be done efficiently, and cost-effectively?

Fighting fire with fire

If there’s one thing we’ve learned from Google in the past year or so, it’s that the solution to cooling high-density machine learning workloads may be more machine learning. The Mountain View giant spent several years testing an algorithm that can learn how to best adjust cooling infrastructure. Consequently, Google yielded a 40-percent reduction in the amount of energy used for cooling. Phase two of that deployment is to put the algorithm on auto-pilot rather than having it make recommendations to human operators.

Clearly, machine learning can and has been used to achieve greater data center cooling efficiency. While most data centers are not yet equipped to do the same, the theory behind how machine learning can optimize cooling efficiency is fairly well understood.

It starts with a PID (proportional integral derivative) loop. This tried and true method helps an industrial system (cooling infrastructure in this case) make real-time adjustments to thermostats by comparing the actual temperature of the data center to the desired temperature so as to calculate an error rate. It then uses that error rate to make a course correction that will yield the desired temperature with the lowest electricity consumption.

If there’s one thing we’ve learned from Google in the past year or so, it’s that the solution to cooling high-density machine learning workloads may be more machine learning.

PID loops work well; however, they optimize based on a finite set of conditions, and when it comes to data center cooling, there are many conditions that are constantly in flux. This is where machine learning comes into play. Rather than tasking a person with optimizing and re-optimizing based on shifting conditions, an algorithm can monitor PID loops and constantly adjust as needed.

In other words, the PIDs are perpetually configured based on changing factors that influence cooling infrastructure efficiency. Everything from internal humidity, to external weather, to utilization fluctuations within the facility, to interactions between different elements within the cooling infrastructure can influence the desired temperature stability in a high-density data center, and also how efficiently that desired temperature is achieved. It is impractical and costly for a human to constantly optimize PID loops to ensure the most efficient configuration is always in place.

But a machine learning algorithm can. It can theoretically learn the optimal settings for each individual circumstance and apply these adjustments automatically, without human intervention, based on the real-time external and internal conditions. Think of it as auto pilot for data center cooling.

Turning concept into reality

Google building an application like this is one thing, but what about other data center operators?

The development and implementation for the type of application we’re describing could come with a colossal upfront cost, one that’s hard to justify for most data center operators – even those with many data centers.

However, developing and training software to act in this way could be a competitive advantage for forward-thinking controls companies. Arguably, in the future, it will be table stakes. Of course, even with such advanced cooling controls, the data center’s physical infrastructure is still important. Legacy data centers using raised floor and inefficient cooling infrastructure have lower limits to capacity and efficiency – regardless of how smart the controls program is.

The sense of urgency for this type of system is nascent. But we know with certainty that the majority of data center operators (67 percent, according to AFCOM) are seeing increasing densities. We also know that machine learning’s power requirements have potential to spur this growth on at a blistering pace in the years ahead.

While we don’t know yet is how we’ll handle this transformation, I suspect that the solution is already right under our noses.

John Sasser is Sr. Vice President of Data Center Operations for Sabey Data Centers. Connect with John on LinkedIn.

About the Author

Voices of the Industry

Our Voice of the Industry feature showcases guest articles on thought leadership from sponsors of Data Center Frontier. For more information, see our Voices of the Industry description and guidelines.

Sponsored Recommendations

A better approach to boost data center capacity – Supply capacity agreements

Explore a transformative approach to data center capacity planning with insights on supply capacity agreements, addressing the impact of COVID-19, the AI race, and the evolving...

What to Consider for Underground Duct Bank Runs

Explore the intricacies of underground duct bank runs and why they are crucial for safeguarding electrical wiring against environmental elements. Discover the considerations, ...

8 Types of Electrical Conduit and Their Uses

Unlock the world of electrical conduit types and their diverse uses with our comprehensive guide! From metallic to non-metallic, explore the pivotal role these conduits play in...

Electrical Conduit Elbow eGuide

Discover the superiority of fiberglass conduit elbows for data center and utility projects, exploring their engineering benefits, installation advantages, and protection against...

Shutterstock, courtesy of BluePrint Supply Chain

Diving Into the Depths of Procurement & Sourcing

Robert Moffit, president of BluePrint Supply Chain, explores the complexities of industrial construction procurement and sourcing, revealing often-overlooked questions.

White Papers

Get the full report
Get the full report
Get the full report
Get the full report
Get the full report

Enhancing Resiliency For the Energy Transition

Nov. 14, 2021
This white paper from Enchanted Rock explores how dual purpose microgrids can offer resiliency and stability to the grid at large.