Amazon Web Services has begun designing its own rack-level uninterrupted power supply (UPS) units for its data centers, a move that will dramatically improve the power efficiency of its cloud computing operations, the company said this week.
In the infrastructure keynote of the virtual re:Invent conference, AWS Senior VP of Global Infrastructure Peter DeSantis said the switch to a distributed UPS architecture will reduce the energy conversion loss in its data center power chain by 35 percent.
“Rather than using a big third-party UPS, we now use small battery packs and custom power supplies that we integrate into every rack,” said DeSantis. “You can think about this is a micro UPS, but it’s far less complicated. And because we designed it ourselves, we know everything about it, and we control all the pieces of the software.”
DeSantis said AWS believes it will reduce its failure risk by removing third-party software from the most important components in its power infrastructure.
“This allows us to eliminate complexity from features we don’t need, and we can iterate at Amazon speed to improve the design,” he said. “Now the batteries can also be removed and replaced in seconds rather than hours. You can do this without turning off the system. This allows us to drastically reduce the risk of maintenance we need to do to the batteries.”
Why A Distributed UPS is Important
Traditional data centers use a centralized UPS system that manages large banks of batteries, which can provide emergency power for an entire data hall or perhaps an entire facility. Most data centers have extra UPS systems for redundancy, providing a backup in case the primary UPS should fail.
The design adopted by Amazon effectively shifts the UPS and battery backup functions from the data center into the server cabinet. By distributing the UPS units, a single failure will impact only the servers in a single rack, rather than an entire data hall or building. This approach also simplifies the path of electricity from the power grid to the rack, eliminating steps that can waste power, including conversions from AC and DC (a common practice in centralized UPS systems) and stepping down to lower voltages.
This also eliminates a single point of failure (SPOF), a core principle in data center reliability. DeSantis described this as reducing the “blast radius” – the scope of the operations that can be impacted by any single piece of equipment.
“This is exactly the sort of design that lets me sleep like a baby,” said DeSantis. “And indeed, this new design is getting even better availability” – better than “seven nines” or 99.99999 percent uptime, DeSantis said.
A Hyperscale History of UPS Design Innovation
The change is consequential because it will have an outsized impact on the energy efficiency of the world’s largest cloud operator, particularly as AWS actively deploys new data centers to boost its cloud capacity. But the shift to a distributed UPS design is not new, and several other hyperscale operators adopted this approach years ago.
- Google was the first major data center operator to move away from a centralized UPS. In 2009 it began using custom servers with a power supply that integrates a battery, effectively placing a UPS in every server chassis. It later adopted a design that brings 48V power directly to the server motherboard, further reducing conversions and power loss.
- In 2011 Facebook introduced its Open Compute data center designs, including a streamlined power designs featuring in-row UPS units that each supported three racks of gear, eliminating conversions by bringing having servers use 277 volt AC power instead of the usual 208 volts.
- In 2015 Microsoft introduced its Local Energy Storage (LES) design that adds a mini-UPS in each server chassis. The LES design was contributed to the Open Compute Project, allowing other data center operators to use it.
These efficiency improvements have made cloud data centers the most energy efficient facilities in the world.
AWS Custom Silicon Program Bears Fruit
DeSantis also hailed the impact of Amazon’s custom silicon operation, which it builds its own chips, servers and network components. DeSantis noted how the success of its Nitro cards and software enabled one of the most popular announcements from re:Invent – the availability of AWS instances running on Mac Minis.
“There was a ton of excitement about this launch,” said DeSantis. “We did not need to make any changes to the Mac hardware. We simply connected a Nitro controller via the Mac’s Thunderbolt connection. When you launch a Mac instance, your Mac compatible AMI (Amazon Machine Images, a virtual server instance) runs directly on the Mac Mini, with no hypervisor. The Nitro controller sets up the instance, and provides secure access to the network and any storage attached. And that Mac Mini can now natively use any AWS service.”
Analysts said the announcement was targeted at developers using the Apple platform rather than a direct challenge to hosting firms that specialize in running workloads on Mac Minis. Mac specialist such as Mac Stadium likely offer better economics for scale deployments, as noted in this Twitter thread from Corey Quinn, Chief Cloud Economist at Duckbill Group.
But adding Mac support is just one of a ridiculously long list of new features announced at re:Invent, as AWS brings its scale and staff to bear on the fast-growing market for cloud services. The re:Invent conference continues next week in its virtual setting.