AWS Unveils AI Data Center Designs Supporting 6X Increase in Density
Amazon Web Services' (AWS) latest change to its design for data center infrastructure encompasses a suite of data center components tailored to meet the growing demands of artificial intelligence (AI) and sustainability.
The company has unveiled new data center designs to support more powerful hardware for AI, including refinements in liquid cooling, power distribution and rack design.
AWS said these innovations will support a 6x increase in rack power density over the next two years.
With an announcement at AWS re:Invent, Amazon is claiming these changes will result in a 12% increase in compute power per site, enhance energy efficiency, and significantly improve availability.
The announcement also takes a “do more with less” approach, also talking about AWS delivering on customer demand with fewer data centers after implementing the announced infrastructure changes.
New Data Center Architecture
With nearly two decades of experience in large-scale infrastructure and 13 years dedicated to GPU-based servers for AI workloads, AWS has significant experience, and the data it has accumulated, to build new approaches to data center architectures.
As generative AI applications like Amazon Bedrock grow in adoption, the newly announced plans directly address the higher power densities and performance requirements that the use of, and need for, AI systems that AWS has identified.
While the announced changes can seem, on reflection, obvious places for improvement, the long experience AWS has with their data centers means these changes highlight those areas which can return the greatest benefit.
By starting with simplified electrical and mechanical design these improvements help to ensure their guaranteed infrastructure availability by reducing potential failure points.
Their claim is that the simplified power conversion steps announced will reduce inefficiency and minimizing failure risks by 20%. They have also found that by standardizing bringing backup power closer to the racks, they have cut the number of electrical racks affected by issues by 89%.
Energy savings have also been achieved by finding ways to reduce dependence on cooling fans. This has been achieved by using natural pressure differentials to exhaust hot air, which means the power that would have been used to run the fans has been minimized.
Collectively, these electrical and mechanical changes lower energy consumption, while providing customers with an infrastructure that can still support growing demand.
"These data center capabilities represent an important step forward with increased energy efficiency and flexible support for emerging workloads," noted Prasad Kalyanaraman, vice president of Infrastructure Services at AWS. "But what is even more exciting is that they are designed to be modular, so that we are able to retrofit our existing infrastructure for liquid cooling and energy efficiency to power generative AI applications and lower our carbon footprint.”
More Effective, Efficient Cooling Still Dominates Next-Gen Data Center Designs
Much like the Vertiv CoolPhase Flex, AWS is going to a multimodal cooling approach that integrates liquid and air cooling, allowing for the most efficient, and necessary, cooling to be applied as needed.
AWS said its "unique liquid cooling rack design" was developed in collaboration with leading chip manufacturers, and offers configurable liquid-to-chip cooling in both its new and existing data centers. The hybrid approach will support both liquid and air-cooled equipment in the same facility.
This system is however primarily designed for next-generation AI chipsets like AWS Trainium2 and NVIDIA GB200 NVL72.
The liquid cooling solution is designed to provide efficient thermal management for dense compute chips like AI GPUs, can reduce mechanical energy consumption by up to 46%, and is retrofittable, meaning that AWS will be able to apply this cooling technology to its existing data centers, where applicable.
Rack design also plays a role in improving the cooling and energy efficiency of their data centers, with AWS working with the chip manufacturers to develop rack designs that permit optimal AI processing while minimizing the ongoing operational costs.
Their new rack design also includes a new power shelf to improve power delivery in the data center, as design that they expect to achieve a 6x increase in rack power density over the next two years, with a projected 3x increase after that initial period.
This should be on top of the reduced energy loss achieved from the previously mentioned improved power conversion process.
It's Not Just About the Racks: It’s How You Use Them
By using generative AI-powered software to predict and manage server placement, AWS will be minimizing the amount of stranded power.
Using this software-driven approach enables more efficient energy usage, supporting higher density racks required for AI without sacrificing performance.
And to assure themselves that not only is power being used appropriately, AWS is standardizing on a new control system for monitoring, alarming, and tracking operational process across electrical and mechanical devices.
The deployment of these real time telemetry tools will provide up to the minute diagnostics and improve troubleshooting in operational data centers.
AWS partners expressed their approval of these development plans at the announcent.
For example, James Bradbury, distinguished engineer, Compute, at Anthropic, who use the AWS infrastructure to train foundation models, commented:
As Anthropic develops our leading foundation models, having access to secure, performant, and energy-efficient infrastructure is crucial to our success. AWS’s commitment to building cutting-edge data centers is one of the key reasons we’ve chosen them as our primary cloud provider and training partner. Their design improvements represent a significant step forward in providing secure, scalable, and efficient infrastructure to power AI models and drive innovation in this field.
Where Data Center Construction Meets Sustainability
While knowing what is going on in your data centers at a detailed level gives you control and insight into ongoing issues that impact sustainability, it is also necessary for the newest construction to have a focus on the issue of sustainability.
To this end, AWS has announced that they have adopted lower-carbon materials in its concrete and steel infrastructure for data center construction, cutting the embodied carbon in its data center shells by up to 35%.
Other commitments to sustainable construction practices include structural optimizations to reduce material usage, further improving overall sustainability.
Plans are also underway to transition backup generators in Europe and North America to renewable diesel, a biodegradable and non-toxic fuel. This change has the potential to reduce greenhouse gas emissions by up to 90% compared to fossil diesel.
The current plan is to roll out these changes to their data center infrastructure and construction plans across its existing and upcoming infrastructure. New AWS facilities being built will get the full suite of components beginning in 2025.