AWS Unveils AI Data Center Designs Supporting 6X Increase in Density

Dec. 5, 2024
Security, availability, and performance are the focus of the newest changes to AWS data center components. Newly announced data center designs are focused on improving AI support and data center efficiency.

Amazon Web Services' (AWS) latest change to its design for data center infrastructure encompasses a suite of data center components tailored to meet the growing demands of artificial intelligence (AI) and sustainability.

The company has unveiled new data center designs to support more powerful hardware for AI, including refinements in liquid cooling, power distribution and rack design. 

AWS said these innovations will support a 6x increase in rack power density over the next two years.

With an announcement at AWS re:Invent, Amazon is claiming these changes will result in a 12% increase in compute power per site, enhance energy efficiency, and significantly improve availability. 

The announcement also takes a “do more with less” approach, also talking about AWS delivering on customer demand with fewer data centers after implementing the announced infrastructure changes.

New Data Center Architecture

With nearly two decades of experience in large-scale infrastructure and 13 years dedicated to GPU-based servers for AI workloads, AWS has significant experience, and the data it has accumulated, to build new approaches to data center architectures.  

As generative AI applications like Amazon Bedrock grow in adoption, the newly announced plans directly address the higher power densities and performance requirements that the use of, and need for, AI systems that AWS has identified.

While the announced changes can seem, on reflection, obvious places for improvement, the long experience AWS has with their data centers means these changes highlight those areas which can return the greatest benefit.

By starting with simplified electrical and mechanical design these improvements help to ensure their guaranteed infrastructure availability by reducing potential failure points.

Their claim is that the simplified power conversion steps announced will reduce inefficiency and minimizing failure risks by 20%. They have also found that by standardizing bringing backup power closer to the racks, they have cut the number of electrical racks affected by issues by 89%.

Energy savings have also been achieved by finding ways to reduce dependence on cooling fans. This has been achieved by using natural pressure differentials to exhaust hot air, which means the power that would have been used to run the fans has been minimized.

Collectively, these electrical and mechanical changes lower energy consumption, while providing customers with an infrastructure that can still support growing demand.

"These data center capabilities represent an important step forward with increased energy efficiency and flexible support for emerging workloads," noted Prasad Kalyanaraman, vice president of Infrastructure Services at AWS. "But what is even more exciting is that they are designed to be modular, so that we are able to retrofit our existing infrastructure for liquid cooling and energy efficiency to power generative AI applications and lower our carbon footprint.”

More Effective, Efficient Cooling Still Dominates Next-Gen Data Center Designs

Much like the Vertiv CoolPhase Flex, AWS is going to a multimodal cooling approach that integrates liquid and air cooling, allowing for the most efficient, and necessary, cooling to be applied as needed.

AWS said its "unique liquid cooling rack design" was developed in collaboration with leading chip manufacturers, and offers configurable liquid-to-chip cooling in both its new and existing data centers. The hybrid approach will support both liquid and air-cooled equipment in the same facility.

This system is however primarily designed for next-generation AI chipsets like AWS Trainium2 and NVIDIA GB200 NVL72.

The liquid cooling solution is designed to provide efficient thermal management for dense compute chips like AI GPUs, can reduce mechanical energy consumption by up to 46%, and is retrofittable, meaning that AWS will be able to apply this cooling technology to its existing data centers, where applicable.

Rack design also plays a role in improving the cooling and energy efficiency of their data centers, with AWS working with the chip manufacturers to develop rack designs that permit optimal AI processing while minimizing the ongoing operational costs.

Their new rack design also includes a new power shelf to improve power delivery in the data center, as design that they expect to achieve a 6x increase in rack power density over the next two years, with a projected 3x increase after that initial period.

This should be on top of the reduced energy loss achieved from the previously mentioned improved power conversion process.

It's Not Just About the Racks: It’s How You Use Them

By using generative AI-powered software to predict and manage server placement, AWS will be minimizing the amount of stranded power.

Using this software-driven approach enables more efficient energy usage, supporting higher density racks required for AI without sacrificing performance.

And to assure themselves that not only is power being used appropriately, AWS is standardizing on a new control system for monitoring, alarming, and tracking operational process across electrical and mechanical devices.

The deployment of these real time telemetry tools will provide up to the minute diagnostics and improve troubleshooting in operational data centers.

AWS partners expressed their approval of these development plans at the announcent.

For example, James Bradbury, distinguished engineer, Compute, at Anthropic, who use the AWS infrastructure to train foundation models, commented:

As Anthropic develops our leading foundation models, having access to secure, performant, and energy-efficient infrastructure is crucial to our success. AWS’s commitment to building cutting-edge data centers is one of the key reasons we’ve chosen them as our primary cloud provider and training partner. Their design improvements represent a significant step forward in providing secure, scalable, and efficient infrastructure to power AI models and drive innovation in this field.

Where Data Center Construction Meets Sustainability

While knowing what is going on in your data centers at a detailed level gives you control and insight into ongoing issues that impact sustainability, it is also necessary for the newest construction to have a focus on the issue of sustainability.

To this end, AWS has announced that they have adopted lower-carbon materials in its concrete and steel infrastructure for data center construction, cutting the embodied carbon in its data center shells by up to 35%.

Other commitments to sustainable construction practices include structural optimizations to reduce material usage, further improving overall sustainability.

Plans are also underway to transition backup generators in Europe and North America to renewable diesel, a biodegradable and non-toxic fuel. This change has the potential to  reduce greenhouse gas emissions by up to 90% compared to fossil diesel.

The current plan is to roll out these changes to their data center infrastructure and construction plans across its existing and upcoming infrastructure. New AWS facilities being built will get the full suite of components beginning in 2025.

 

Keep pace with the fast-moving world of data centers and cloud computing by connecting with Data Center Frontier on LinkedIn, following us on X/Twitter and Facebook, and signing up for our weekly newsletters using the form below.

About the Author

David Chernicoff

David Chernicoff is an experienced technologist and editorial content creator with the ability to see the connections between technology and business while figuring out how to get the most from both and to explain the needs of business to IT and IT to business.

Sponsored Recommendations

Tackling Utility Project Challenges with Fiberglass Conduit Elbows

Explore how fiberglass conduit elbows tackle utility project challenges like high costs, complex installations, and cable damage. Discover the benefits of durable, cost-efficient...

How Deep Does Electrical Conduit Need to Be Buried?

In industrial and commercial settings conduit burial depth can impact system performance, maintenance requirements, and overall project costs.

Understanding Fiberglass Conduit: A Comprehensive Guide

RTRC (Reinforced Thermosetting Resin Conduit) is an electrical conduit material commonly used by industrial engineers and contractors.

NECA Manual of Labor Rates Chart

See how Champion Fiberglass compares to PVC, GRC and PVC-coated steel in installation.

Andrius Kaziliunas/Shutterstock.com
Source: Andrius Kaziliunas/Shutterstock.com

Cabling Systems: On the Forefront of the Low Voltage Telecom Revolution

Jose Reyes, Vice President & Co-Owner of Cabling Systems INC, explores the history of low voltage telecom cabling systems.

White Papers

DCF media kit 2022

Data Center Frontier Media Kit

Oct. 16, 2021
Data Center Frontier is ideal for companies that want to be seen as a thought leader in the data center industry. The programs include opportunities to build awareness, submit...