Sponsored

Data center cooling: Machine learning is the problem and the solution

Clearly, machine learning can and has been used to achieve greater data center cooling efficiency. While most data centers are not yet equipped to do the same, the theory behind how machine learning can optimize cooling efficiency is fairly well understood. Sabey Data Centers’ John Sasser, Sr. Vice President of Data Center Operations, takes a look at how machine learning presents both challenges and opportunities for today’s data center cooling technology.

Voices of the Industry

Feb. 26, 2019

5 min read

Add Us On Google

As significant energy consumers, those in the data center sector have embraced a greater responsibility than most to be good stewards of resources while reducing carbon output (Photo: Sabey Data Centers)

Sabey Data Centers’ John Sasser, Sr. Vice President of Data Center Operations, takes a look at how machine learning presents both challenges and opportunities for today’s data center cooling technology.

John Sasser, Sr. Vice President of Data Center Operations, Sabey Data Centers

It’s no secret that rack kW is steadily increasing in the data center, nor is it any wonder why. Processing power is greater than ever, and there’s only one direction for it to go: up.

However, the massive, sustained computational power required by machine learning workloads is anything but business as usual. Most data center operators can grapple with gradual increases in IT footprint, but high-density GPU clusters for machine learning raise the stakes, particularly where cooling is concerned.

Perhaps newer data centers, especially those using containment strategies, have the infrastructure to adequately cool what, in some cases, amounts to 30 kW per rack or more. Most older data centers, though, aren’t ready to sustain these requirements. This could prove problematic as artificial intelligence, machine learning and deep learning workloads become more commonplace.

Indeed, some colocation providers that operate older raised floor data centers without hot aisle containment, already serve customers that want to load up their cabinets but lack the ability to cool their desired densities. But the next wave of customers will have an even bigger ask: cooling infrastructure that can support machine learning workloads.

How can this be done efficiently, and cost-effectively?

Fighting fire with fire

If there’s one thing we’ve learned from Google in the past year or so, it’s that the solution to cooling high-density machine learning workloads may be more machine learning. The Mountain View giant spent several years testing an algorithm that can learn how to best adjust cooling infrastructure. Consequently, Google yielded a 40-percent reduction in the amount of energy used for cooling. Phase two of that deployment is to put the algorithm on auto-pilot rather than having it make recommendations to human operators.

It starts with a PID (proportional integral derivative) loop. This tried and true method helps an industrial system (cooling infrastructure in this case) make real-time adjustments to thermostats by comparing the actual temperature of the data center to the desired temperature so as to calculate an error rate. It then uses that error rate to make a course correction that will yield the desired temperature with the lowest electricity consumption.

If there’s one thing we’ve learned from Google in the past year or so, it’s that the solution to cooling high-density machine learning workloads may be more machine learning.

PID loops work well; however, they optimize based on a finite set of conditions, and when it comes to data center cooling, there are many conditions that are constantly in flux. This is where machine learning comes into play. Rather than tasking a person with optimizing and re-optimizing based on shifting conditions, an algorithm can monitor PID loops and constantly adjust as needed.

In other words, the PIDs are perpetually configured based on changing factors that influence cooling infrastructure efficiency. Everything from internal humidity, to external weather, to utilization fluctuations within the facility, to interactions between different elements within the cooling infrastructure can influence the desired temperature stability in a high-density data center, and also how efficiently that desired temperature is achieved. It is impractical and costly for a human to constantly optimize PID loops to ensure the most efficient configuration is always in place.

But a machine learning algorithm can. It can theoretically learn the optimal settings for each individual circumstance and apply these adjustments automatically, without human intervention, based on the real-time external and internal conditions. Think of it as auto pilot for data center cooling.

Turning concept into reality

Google building an application like this is one thing, but what about other data center operators?

The development and implementation for the type of application we’re describing could come with a colossal upfront cost, one that’s hard to justify for most data center operators – even those with many data centers.

However, developing and training software to act in this way could be a competitive advantage for forward-thinking controls companies. Arguably, in the future, it will be table stakes. Of course, even with such advanced cooling controls, the data center’s physical infrastructure is still important. Legacy data centers using raised floor and inefficient cooling infrastructure have lower limits to capacity and efficiency – regardless of how smart the controls program is.

The sense of urgency for this type of system is nascent. But we know with certainty that the majority of data center operators (67 percent, according to AFCOM) are seeing increasing densities. We also know that machine learning’s power requirements have potential to spur this growth on at a blistering pace in the years ahead.

While we don’t know yet is how we’ll handle this transformation, I suspect that the solution is already right under our noses.

John Sasser is Sr. Vice President of Data Center Operations for Sabey Data Centers. Connect with John on LinkedIn.

About the Author

Voices of the Industry

Our Voice of the Industry feature showcases guest articles on thought leadership from sponsors of Data Center Frontier. For more information, see our Voices of the Industry description and guidelines.

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

AI’s Execution Era: Aligned and Netrality on Power, Speed, and the New Data Center Reality

Sponsored

Get in Touch: Conduit Solutions for Data Centers

Sponsored

NECA Manual of Labor Rates Chart

Voices of the Industry

Sponsored

Harmonic Distortion in Data Centers: A Power Quality Planning Guide

Rick Hombsch of MTE explains why power quality is not optional, it's foundational.

Source: iStock.com/gorodenkoff, Image courtesy of NEMA

Sponsored

The Wild West Data Center Industry Needs Standards

NEMA's Patrick Hughes explains why speed without a shared technical foundation is just risk masquerading as efficiency.

Data center cooling: Machine learning is the problem and the solution

Fighting fire with fire

Turning concept into reality

About the Author

Voices of the Industry

Related

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

AI’s Execution Era: Aligned and Netrality on Power, Speed, and the New Data Center Reality

Get in Touch: Conduit Solutions for Data Centers

NECA Manual of Labor Rates Chart

Voices of the Industry

Harmonic Distortion in Data Centers: A Power Quality Planning Guide

The Wild West Data Center Industry Needs Standards

Trending

MSI's Strategic Shift: From Server Vendor to Full-Spectrum AI Infrastructure Provider

Liquid Cooling Market Matures: Innovations, Acquisitions, and Modular Solutions for AI Infrastructure

DCF Poll: Which Technology Will Define the Next Generation of AI Data Centers?

Sponsored Picks

5 Principles for 800 VDC in AI Data Centers: Rack-level Architectures as the Immediate Enabler

Scaling AI-Enabled Digital Twins Across the Data Center Lifecycle with OpenUSD

Get in Touch: Conduit Solutions for Data Centers