Amazon Web Services is developing customer semiconductors to accelerate its cloud computing network, expanding its push into custom hardware, the company said Tuesday. AWS says its new Annapurna ASIC will enable it move data faster across its huge data center network.
“We’re in the semiconductor business!” said James Hamilton, VP and Distinguished Engineer at Amazon Web Services, during a keynote last night at the AWS Re:Invent conference in Las Vegas. “We think this is a really big deal.”
Amazon unveiled the new chip in a presentation showcasing the growth of its fiber network and data infrastructure powering the growth of its massive AWS cloud operation, which now generates more than $10 billion in annual revenue.
Hamilton also provided an overview of Amazon’s data center deployment strategy, as well as a glimpse of recent-generation AWS custom cloud servers and storage units.
The Rise of Custom Chips for Hyperscale Workloads
The new networking chip will be a custom semiconductor known as an ASIC (Application Specific Integrated Circuit) that can be tailored for tasks like network management. This reflects a trend for hyperscale data centers to move beyond CPUs and turn to specialized chips like ASICs, FPGAs (Field Programmable Gating Arrays) and GPUs.
Microsoft is using FPGAs to accelerate its cloud servers, while Google has developed a custom ASIC for artificial intelligence data crunching, while Facebook has opted for a GPU-driven machine learning server.
Amazon has historically been stingy in disclosing details about its infrastructure. But the cloud computing arena is becoming more competitive, with Google, Microsoft and Oracle aggressively adding data centers in a bid to gain ground on market leader AWS. With more customers doing comparison shopping for cloud platforms, Amazon has a vested interest in asserting the competitive advantages of AWS.
The company’s favorite venue for this is re:Invent, the customer conference that drew 32,000 attendees this year, packing event space across three hotels on the Las Vegas Strip.
Private Network Powers AWS Growth
Hamilton did not disappoint. The veteran technologist is something of a rock star among infrastructure geeks, with a long history of innovation at Microsoft and AWS. His 2010 critique of network hardware (“Datacenter Networks Are In My Way”) stoked momentum for disruption in data center networking.
On Tuesday evening Hamilton took the stage at Re:Invent to share the details of the network he once dreamed about. He confirmed that Amazon has built a global private network to manage the flow of data between its data centers and availability zones, providing customers exceptional reliability as well as failover options for applications.
Building a private network is “really, really expensive,” said Hamilton. “But it’s the right thing to do. If you’ve got a packet, the more people that touch it, the less likely it is to be delivered. We always have (network) assets to survive a link failure.”
Amazon’s network spans everything from dense fiber connectivity between its data centers (which are grouped in regional clusters and availability zones) all the way up to long-haul fiber and even undersea cables.
Amazon Web Services organizes its cloud infrastructure into regions, each containing a cluster of data centers. Each region contains multiple Availability Zones, providing customers with the option to mirror or back up key IT assets to avoid downtime.
To illustrate the breadth and density of Amazon’s network, Hamilton provided an overview of the fiber that ties together connect an Amazon region with five availability zones. He didn’t name the region, but it’s clearly the US-East region in Northern Virginia, which is the only AWS region that has five AZs. A huge chunk of AWS infrastructure is concentrated in Loudoun and Prince William counties, where the company has at least 25 data centers and is expanding rapidly.James Hamilton of AWS: Building a private network is really, really expensive. But it's the right thing to do.Click To Tweet
Each region is supported by two transit centers – data centers that connect to Amazon’s global fiber and provide interconnections with other networks, and provide 100 Gbps connections to the other facilities in the region.
“We’re running a lot of redundant fiber between these buildings,” said Hamilton.
How much fiber? A total of 3,456 fibers run through two-inch conduit spanning the region, with a total of 242,374 fiber strands run through US-East. AWS pays close attention to cable management within those conduits, enabling it to pack more capacity into each conduit and quickly ID and repair cabling problems. “It’s saved us a ton of money, because we have so much fiber,” he said.
Amazon’s private fiber network connects its 14 regions around the globe, which includes US regions in Ohio, Oregon and Northern California as well as Virginia. AWS plans to add four more regions next year as it continues its global growth. It hasn’t announced the locations for the new regions, but Reuters reported Wednesday that AWS is in talks with an Italian utility about converting several former power plants into data centers.
One clear area of investment focus is Asia. That’s why Amazon is a partner in the Hawaiki trans-Pacific submarine cable project, which will provide added connectivity between Australia and New Zealand and the United States, running through Hawaii to a landing station on the Oregon coast. The cable is 14,000 kilometers in length, and runs at a depth of 6,000 meters, or about three miles beneath the sea.
Custom Gear Boosts Control, Efficiency
Amazon has been building its own network hardware for years. “We run our own custom-built routers,” said Hamilton. “It’s built to our specs. As big as the cost gain is – and it’s pretty big – the biggest gain is in reliability.
“Our networking gear has one requirement: ours,” Hamilton continued. “As fun as it would be to add a lot of features, it would be less reliable. So we just don’t do it.”
Amazon’s network gear currently uses a Tomahawk Ethernet ASIC from network vendor Broadcom, which supports 128 ports of 25Gbps Ethernet. But there’s more innovation to come.
Amazon’s custom Annapurna ASIC will provide “second generation Enhanced Networking,” enabling AWS to boost performance and efficiency by controlling its networking silicon, hardware and software.
Hamilton said Amazon has also benefited from moving to a 25Gbps networking scheme, which emerged as an alternative to the IEEE standard 10G and 40G gear.
“We jumped on 25G early,” said Hamilton. “We love where we are right now. We’re confident that 25G is the right wave. We buy enough hardware that it doesn’t matter. Vendors are always willing to work with us.”
Sleek Servers Yield Power Savings
Hamilton also offered the re:Invent crowd a look at some recent (albeit not current) Amazon hardware. This included a 42U storage rack packed with 1,110 disks, which adds up to 8.8 petabytes of storage. The downside: the rack weights 2,778 pounds. Be careful rolling that one around the data hall!
The presentation also included a look at a recently retired Amazon 1U server design, which was unusually roomy inside the chassis, providing extra room for airflow and cooling:
“This is a winning design,” he said. “But it’s a very different design than what’s out there with most vendors.”
Hamilton said AWS likes a simple approach that trades component density for power efficiency and the ability to operate in warmer environments, which allows data center providers to save money on cooling. At the server level, tiny gains in power efficiency add up as they ripple across the huge AWS footprint to create significant savings. The company runs between 50,000 and 80,000 servers in each data center, he said, with several Availability Zones spanning more than 300,000 servers.
On the data center design front, Hamilton said AWS is building slightly larger data centers. The company has traditionally built new facilities with 25 to 30 megawatts of power capacity, but now targets new builds for 32 megawatts.
“We could easily build 250 megawatt data centers,” said Hamilton. “As you get bigger, the gains (from economies of scale) are relatively small.
“This is about the right size facility,” he added, noting the importance of limiting the size of its failure domains. “It costs us a little more, but we think it’s the right thing for our customers.”