AI and HPC Drive Demand for Higher Density Data Centers, New As-a-Service Offerings

Aug. 15, 2023
The power and cooling demands for AI and HPC require specialized knowledge and capabilities. Operations and engineering will need to have that specialized knowledge, as well.

Just as data centers found themselves faced with needing to deal with new issues of power, space, and sustainability, the latest cutting-edge technology, artificial intelligence and machine learning, brings challenges to all of these concerns.

It’s not that having to face these issues is new; it is that the adoption of AI is happening at an unprecedented rapid rate. Within a year or so, colocation providers will need to have adjusted to face the new, larger demand for high density data centers.

So what does that actually mean? It doesn’t seem likely that there will be a large pool of customers investing millions into AI-specific hardware, beating on the door of colocation providers, so what do data centers need to offer?

Power and Cooling First

From the infrastructure side, the availability of power and cooling will be the first thing customers look for. But do providers need to be able to support an entire data hall of Nvidia H100 scale GPUs, or is it more likely that a single rack with a 4 or 8 GPU server and appropriate storage is all that is necessary? 

Granted, that single rack solution would need to support potentially 50kW or more for the relatively simple single AI server deployment. As we recently pointed out, the technology to handle this workload is already in place in many facilities, but it represents just a starting point for deploying AI into colocation.

What is happening now and will be the mid-term solution, for both providers and customers, is an AI-as-a-Service solution.

To wit, the market for hosting of customers' large scale AI deployments is beginning to drive demand, as it seems that everyone is interested in getting their feet wet. This means that Microsoft, Google, Amazon, and even Oracle, with their respective clouds, will be seeing customers looking to test, evaluate, and potentially deploy cloud AI services using their on-demand provisioning.

Who Will Be Driving this Demand?

But it’s not just the top-tier players getting into offering AI cloud services. At the recent Computex 2023 conference, Nvida made a point of identifying their cloud partners, beyond the big four.

An example of this is Cirrascale, who made their name originally as a provider of HPC-on-demand computing. They now offer their AI Innovation Cloud, which takes advantage of the support infrastructure to offer customers the opportunity to evaluate AI/ML systems. Their commitment to AI hosting is reflected in the available choices to customers.

Not only can a customer choose to use previous and current generations of Nvidia AI hardware, the company also hosts their Graphcloud, utilizing Graphcore’s BOW IPU; Cerebras' AI Model Studio running on their hosted Cerabras cloud; and SambaNova’s dataflow-as-a-service and foundation models.

These are the four leading accelerated AI/ML technologies that aren’t Google or AWS, both of whom also offer Nvida GPUs in addition to their own, in house, designs. Cirrascale may also be the only single source for these competing and, in some cases, complementary technologies and even provides published pricing models for the different technologies within their cloud.

Taking a slightly different approach is Lambda Labs, who offers five different levels of Nvidia-based hosting plus full colocation services designed for your ML hardware and software stack. They offer high density power and cooling that is designed specifically for GPU compute workloads. Additionally, they can provide their engineered GPU clusters either on-premises in your data center or in their data center.

Other providers highlighted by Nvidia include Coreweave, Paperspace, and Vultr. What all of these providers have in common is that they are dedicated cloud service providers, with multiple data centers, and a focus on supporting AI/ML workloads.  Some look beyond their AI focus and offer more standardized cloud data center options, such as a full range of storage, managed databases, Kubernetes, and bare metal deployments.

This gives us some idea of where the future of colocation lies. Developing the necessary support infrastructure for high density computing, whether that means offering racks with passive rear door cooling, or full data halls equipped to deploy liquid-cooled IT workload equipment, or anything in between, needs to be on the radar as new facilities are built and existing spaces refurbished.

It Will Become an Industry Driver, If It Isn't Already

While not every data center needs to be equipped to run the most intensive AI workloads, the current trend where space in valuable data center locations is at a premium alone means that building for higher density rack solutions is the only path to the future.

This isn’t to imply that every data center need be built along the lines of the Colovore facilities, who offer 35 kW per rack as their standard density and have talked about building the capability to have racks that exceed 200 kW (a number that seems excessive until you look at the power demands of AI/ML workload dedicated hardware). But it does mean that, especially in space constrained location, data centers will need to standardize on supporting significantly higher power and cooling demands.

Fortunately for existing facilities, advances in cooling technologies don’t require wholesale rip-and-replace of existing cooling infrastructures.  There is a sufficiently broad range of cooling technologies that allow cost effective, on-demand upgrades, where new rack designs, passive and liquid cooling techniques, and solutions that scale from rack to entire data center are at hand.

As an excellent example of these technologies in the first week in August 2023, Digital Realty announced that its colocation facilities in 28 markets will begin support rack densities of up to 70Kw. It's doing this using what it calls Air-Assisted Liquid Cooling technologies, applied by introducing liquid cooled rear-door heat exchangers into their existing colocation facilities. 

Support for high-density hosting isn’t the future; it’s the now. Finding solutions that scale well will be the goal of many providers. And the business that will drive these changes is ramping up quickly,  as customers begin to understand the value that high performance computing and AI solutions will bring to their business.

About the Author

David Chernicoff

David Chernicoff is an experienced technologist and editorial content creator with the ability to see the connections between technology and business while figuring out how to get the most from both and to explain the needs of business to IT and IT to business.

Sponsored Recommendations

How Deep Does Electrical Conduit Need to Be Buried?

In industrial and commercial settings conduit burial depth can impact system performance, maintenance requirements, and overall project costs.

Understanding Fiberglass Conduit: A Comprehensive Guide

RTRC (Reinforced Thermosetting Resin Conduit) is an electrical conduit material commonly used by industrial engineers and contractors.

NECA Manual of Labor Rates Chart

See how Champion Fiberglass compares to PVC, GRC and PVC-coated steel in installation.

Electrical Conduit Cost Savings: A Must-Have Guide for Engineers & Contractors

To help identify cost savings that don’t cut corners on quality, Champion Fiberglass developed a free resource for engineers and contractors.

Anggalih Prasetya/Shutterstock.com
Source: Anggalih Prasetya/Shutterstock.com

AI in the Data Center: Building Partnerships for Success

Wesco’s Alan Farrimond explains how the right partnerships can help data centers overcome the barriers to growth and meet the demands of AI.

White Papers

Get the full report

Achieving Energy Efficiency Goals in Data Centers

April 15, 2022
One of the challenges global data centers currently face is the need to meet the increased processing and storage needs of their customers while also making their operations more...