Microsoft Unveils Custom-Designed Data Center AI Chips, Racks and Liquid Cooling

Today at Microsoft Ignite, the company unveiled two custom-designed chips and integrated systems resulting from a multi-step process for meticulously testing its homegrown silicon, the fruits of a method the company's engineers have been refining in secret for years, as revealed at its Source blog.

The end goal is an Azure hardware system that offers maximum flexibility and can also be optimized for power, performance, sustainability or cost, said Rani Borkar, corporate vice president for Azure Hardware Systems and Infrastructure (AHSI).

“Software is our core strength, but frankly, we are a systems company. At Microsoft we are co-designing and optimizing hardware and software together so that one plus one is greater than two,” Borkar said. “We have visibility into the entire stack, and silicon is just one of the ingredients.”

The Chips

The newly introduced Microsoft Azure Maia AI Accelerator chip is optimized for artificial intelligence (AI) tasks and generative AI. For its part, the Microsoft Azure Cobalt CPU is an Arm-based processor chip tailored to run general purpose compute workloads on the Microsoft Cloud.

Microsoft said the new chips will begin to appear by early next year in its data centers, initially powering services such as Microsoft Copilot, an AI assistant, and its Azure OpenAI Service. They will join a widening range of products from the company's industry partners geared toward customers eager to take advantage of the latest cloud and AI technology breakthroughs.

“Microsoft is building the infrastructure to support AI innovation, and we are re-imagining every aspect of our data centers to meet the needs of our customers,” noted Scott Guthrie, executive vice president of Microsoft’s Cloud + AI Group.

Guthrie emphasized, “At the scale we operate, it’s important for us to optimize and integrate every layer of the infrastructure stack to maximize performance, diversify our supply chain and give customers infrastructure choice.”

The Microsoft Azure Maia 100 AI Accelerator is the first chip designed by Microsoft for large language model training and inferencing in the Microsoft Cloud. The chip will power some of the largest internal AI workloads running on Microsoft Azure.

Additionally, the company said its partner OpenAI is providing feedback on the Azure Maia platform as to how its workloads run on infrastructure tailored for its large language models, which will help to inform future Microsoft designs.

“Since first partnering with Microsoft, we’ve collaborated to co-design Azure’s AI infrastructure at every layer for our models and unprecedented training needs,” explained Sam Altman, CEO of OpenAI.

Altman added, “We were excited when Microsoft first shared their designs for the Maia chip, and we’ve worked together to refine and test it with our models. Azure’s end-to-end AI architecture, now optimized down to the silicon with Maia, paves the way for training more capable models and making those models cheaper for our customers.”

The Maia 100 AI Accelerator was designed specifically for use with the Azure hardware stack, pointed out Brian Harry, a Microsoft technical fellow leading the Azure Maia team.

That vertical integration – the alignment of chip design with the larger AI infrastructure designed with Microsoft’s workloads in mind – can yield huge gains in performance and efficiency, Harry said. “Azure Maia was specifically designed for AI and for achieving the absolute maximum utilization of the hardware,” he added.

Importantly, according to Wes McCullough, Microsoft's corporate vice president of hardware product development, Microsoft's new Cobalt 100 CPU is built on an Arm architecture which embodies a new type of energy-efficient chip design, optimized to deliver greater efficiency and performance in cloud-native offerings.

Choosing Arm technology was a key element in Microsoft’s sustainability goal, McCullogh said, adding that the company aims to optimize “performance per watt” throughout its data centers - which essentially means getting more computing power for each unit of energy consumed.

“The architecture and implementation is designed with power efficiency in mind,” he said. “We’re making the most efficient use of the transistors on the silicon. Multiply those efficiency gains in servers across all our data centers, it adds up to a pretty big number.”

Send In the NVIDIA GPUs

To complement its custom silicon efforts, Microsoft said it is also expanding on its industry partnerships to provide more infrastructure options for customers.

For example, the company has launched a preview of its new NC H100 v5 Virtual Machine Series, built for NVIDIA H100 Tensor Core GPUs, offering greater performance, reliability and efficiency for mid-range AI training and generative AI inferencing.

Microsoft also said it will add the latest NVIDIA H200 Tensor Core GPU to its fleet next year to support larger model inferencing with no increase in latency.

New Racks with a Side of Liquid Cooling

Meanwhile, no racks existed to house the unique requirements of the Maia 100 server boards, so Microsoft built them from scratch.

These new racks are wider than what typically sits in the company’s data centers. That expanded design provides ample space for both power and networking cables, it said, essential for the unique demands of AI workloads.

As duly noted by Microsoft's blog, such AI tasks come with intensive computational demands that consume more power, and traditional air-cooling methods often fall short for these high-performance chips. As such, liquid cooling employing circulating fluids to dissipate heat has emerged as the preferred industry solution for alleviating such thermal challenges, ensuring that chips run efficiently without overheating.

But Microsoft’s current data centers weren’t designed for large liquid chillers. As a workaround, the company has developed a “sidekick”element that sits next to the Maia 100 rack.

How the sidekick works is analogous to the operation of a car's radiator. Cold liquid flows from the sidekick to cold plates that are attached to the surface of Maia 100 chips. Each plate has channels through which liquid is circulated to absorb and transport heat.

That flows to the sidekick, which removes heat from the liquid and sends it back to the rack to absorb more heat, in a circular process. Microsoft's McCullough said the tandem design of rack and sidekick underscores the value of a systems approach to infrastructure.

"By controlling every facet — from the low-power ethos of the Cobalt 100 chip to the intricacies of data center cooling — Microsoft can orchestrate a harmonious interplay between each component, ensuring that the whole is indeed greater than the sum of its parts in reducing environmental impact," he added.

The Best Set of Options

Before 2016, most layers of the Microsoft cloud were bought off the shelf, noted Pat Stemen, partner program manager on the company's AHSI team. Then, the company began to custom build its own servers and racks, the better to reduce costs while furnishing a more consistent customer experience, before, over time, silicon became the system's primary missing piece.

Today's announcements go a long way toward shoring up that gap. The silicon architecture unveiled today lets Microsoft not only enhance cooling efficiency, but also optimize the use of its current data center assets, whie maximizing server capacity within its existing footprint, the company said.

Meanwhile, Stemen noted that Microsoft has shared design discoveries gleaned from its custom rack, which can be used those no matter what piece of silicon sits inside, with its industry partners.

“All the things we build, whether infrastructure or software or firmware, we can leverage whether we deploy our chips or those from our industry partners,” he said. “This is a choice the customer gets to make, and we’re trying to provide the best set of options for them, whether it’s for performance or cost or any other dimension they care about.”

Keep pace with the fast-moving world of data centers and cloud computing by connecting with Data Center Frontier on LinkedIn, following us on X/Twitter and Facebook, and signing up for our weekly newsletters using the form below.