New AI Chips Seek to Reshape Data Center Design, Cooling

Cerebras Systems has emerged from stealth mode with a wafer-sized chip that completely rethinks the form factor for data center computing. The challenge: Each chip uses 15 kilowatts of power.

Rich Miller

Aug. 20, 2019

5 min read

The Cerebras Wafer-Scale Engine (WSE) is optiomized for artificial intelligence workloads and is the largest chip ever built. (Image: Cerebras Systems)

The rise of artificial intelligence is transforming the business world. It could shake up the data center along the way.

Powerful new hardware for artificial intelligence (AI) workloads have the potential to reshape the design of data centers and how they are cooled. This week’s Hot Chips conference at Stanford University showcased a number of startups offering new takes on customized AI silicon, as well as new offerings from the incumbents.

The most startling new design came from Cerebras Systems, which came out of stealth mode with a chip that completely rethinks the form factor for data center computing. The Cerebras Wafer-Scale Engine (WSE) is the largest chip ever built, at nearly 9 inches in width. At 46.2 square millimeters, the WSE is 56 times larger than the largest graphics processing unit (GPU).

Is bigger better? Cerebras says size is “profoundly important” and its larger chips will process information more quickly, reducing the time it takes AI researchers to train algorithms for new tasks.

The Cerebras design offers a radical new take on the future of AI hardware. Its first products have yet to hit the market, and analysts are keen to see if performance testing validates Cerebras’ claims about its capabilities.

Cooling 15 Kilowatts per Chip

If it succeeds, Cerebras will push the existing boundaries of high-density computing, a trend which is already beginning to create both opportunities and challenges for data center operators. A single WSE contains has 400,000 cores and 1.2 trillion transistors and uses 15 kilowatts of power.

I will repeat for clarity – a single WSE uses 15 kilowatts of power. For comparison, a recent survey by AFCOM found users were averaging 7.3 kilowatts of power for an entire rack, which can hold as many as 40 servers. Hyperscale providers average about 10 to 12 kilowatts per rack.

The heat thrown off by the Cerebras chips will require a different approach to cooling, as well as the server chassis. The WSE will be packaged as a server appliance, which will include a liquid cooling system that reportedly features a cold plate fed by a series of pipes, with the chip positioned vertically in the chassis to better cool the entire surface of the huge chip.

A look at the manufacturing process for the Cerebras Wafer Scale Engine (WSE), which was fabricated at TSMC. (Image: Cerebras)

Most servers are designed to use air cooling, and thus most data centers are designed to use air cooling. A broad shift to liquid cooling would prompt data center operators to support water to the rack, which is often delivered through a system of pipes under a raised floor.

Google’s decision to shift to liquid cooling with its latest hardware for artificial intelligence raised expectations that others might follow. Alibaba and other Chinese hyperscale companies have adopted liquid cooling.

“Designed from the ground up for AI work, the Cerebras WSE contains fundamental innovations that advance the state-of-the-art by solving decades-old technical challenges that limited chip size—such as cross-reticle connectivity, yield, power delivery, and packaging,” said Andrew Feldman, founder and CEO of Cerebras Systems. “Every architectural decision was made to optimize performance for AI work. The result is that the Cerebras WSE delivers, depending on workload, hundreds or thousands of times the performance of existing solutions at a tiny fraction of the power draw and space.”

Data center observers know Feldman as the founder and CEO of SeaMicro, an innovative server startup that packed more than 750 low-power Intel Atom chips into a single server chassis.

Much of the secret sauce for SeaMicro was in the networking fabric that tied those cores together. Thus, it’s not surprise that Cerebras features an interprocessor fabric called Swarm that combines massive bandwidth and low latency. The company’s investors include two networking pioneers, Andy Bechtolsheim and Nick McKeown.

For deep dives into Cerebras and its technology, see additional coverage in Fortune, TechCrunch, The New York Times and Wired.

New Form Factors Bring More Density, Cooling Challenges

We’ve been tracking progress in rack density and liquid adoption for years at Data Center Frontier as part of our focus on new technologies and how they may transform the data center. New hardware for AI workloads is packing more computing power into each piece of equipment, boosting the power density – the amount of electricity used by servers and storage in a rack or cabinet – and the accompanying heat.

Cerebras is one of a group of startups building AI chips and hardware. The arrival of startup silicon on the AI computing market follows several years of intense competition between chip market leader Intel Corp. and rivals including NVIDIA, AMD and several players advancing ARM technology. Intel continues to hold a dominant position in the enterprise computing space, but the development of powerful new hardware optimized for specific workloads has been a major trend in the high performance computing (HPC) sector.

This won’t be the first time that the data center market has had to reckon with new form factors and higher-density. The introduction of blade servers packed dozens of server boards into each chassis, bringing higher heat loads that many data center managers struggled to manage. The rise of the Open Compute Project also introduced new standards, including a 21-inch rack that was slightly wider than the traditional 19-inch rack.

There’s also the question of whether the rise of powerful AI appliances will compress more computing power into a smaller space, prompting redesigns or retrofits for liquid cooling, or whether high-density will be spread out within existing facilities to distribute their impact on existing power and cooling infrastructure.

For further reading, here are articles that summarize some of the key issues in the evolution of high-density hardware and how the data center industry has adapted:

About the Author

Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

DoD Taps 8 Nuclear SMR Vendors in Push to Deploy On-Site Microreactors: Data Center Energy Implications

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

Sponsored

NECA Manual of Labor Rates Chart

Sponsored

Electrical Conduit Cost Savings: A Must-Have Guide for Engineers & Contractors

Voices of the Industry

Source: Shutterstock, courtesy of BluePrint Supply Chain

Sponsored

When Gigawatt Construction Outpaces the Supply Chain

Jarrett Atkinson of BluePrint Supply Chain explains why construction execution systems must evolve in the gigawatt era.

Sponsored

6 Ways to Regain Control of Cloud Costs

Mastering cloud expenditure is vital for businesses of all sizes. Matt Powers of Wesco outlines six strategies to help you take control of your cloud spending.

New AI Chips Seek to Reshape Data Center Design, Cooling

Cooling 15 Kilowatts per Chip

New Form Factors Bring More Density, Cooling Challenges

About the Author

Rich Miller

Related

DoD Taps 8 Nuclear SMR Vendors in Push to Deploy On-Site Microreactors: Data Center Energy Implications

Vertiv Launches OneCore Modular Data Center Platform for AI and HPC

NECA Manual of Labor Rates Chart

Electrical Conduit Cost Savings: A Must-Have Guide for Engineers & Contractors

Voices of the Industry

When Gigawatt Construction Outpaces the Supply Chain

6 Ways to Regain Control of Cloud Costs

Trending

Aeroderivative Turbines Move to the Center of AI Data Center Power Strategy

JLL: Hyperscale and AI Demand Push North American Data Centers Toward Industrial Scale

Utah’s 4 GW AI Campus Tests the Limits of Speed-to-Power

Sponsored Picks

Improving speed to market for data center operators

Navigating Liquid Cooling Architectures for Data Centers with AI Workloads

Bending the Energy Curve: Decoupling Digitalization Trends from Data Center Energy Growth