Facebook Unveils New Hardware to Manage Data Center Traffic

March 21, 2018
Facebook has designed a distributed networking system to support the massive flows of data moving between its data centers. The debut of Fabric Aggregator lays the groundwork for even larger cloud campuses to come.

SAN JOSE, Calif. – The massive data traffic coursing through Facebook’s cloud campuses has outstripped the capabilities of commercial networking hardware. So the company has built its own distributed networking system to support its growing data needs – and to lay the groundwork for even larger cloud campuses to come.

At Tuesday’s Open Compute Summit 2018, Facebook unveiled the Fabric Aggregator, a new system to manage data traffic between its data centers. The company is donating the device design to the Open Compute Project (OCP), the open source hardware project which Facebook co-founded in 2011.

The development of Fabric Aggregator highlights how the super-sizing of data centers and cloud campuses has implications for network  equipment. As hyperscale companies like Facebook continue to grow, the huge volumes of user data prompt them to add data centers. This leads to larger cloud campuses, with truly massive volumes of data moving between them.

The volume of this “East-West” traffic between data centers far surpasses the volume of data traveling to other campuses and the Internet – known as “North-South” traffic.

Even Bigger Campuses Ahead?

As we noted yesterday, Facebook has been increasing the scale of its data center campuses. Since the beginning of 2017, Facebook has announced five new cloud campuses, and now has 12  around the globe, including nine in the U.S.

Early campuses featured two data centers, but the company has recently been building bigger.

“We’ve been moving to six buildings, creating a large increase in East-West traffic,” said Sree Sankar, Technical Product Manager at Facebook. “That required a big change.”

Facebook’s Sree Sankar describe’s how Fabric Aggregator manages traffic between data centers during a presentation at the Open Compute Summit Tuesday in San Jose, Calif. (Photo: Rich Miller)

The Fabric Aggregator is a distributed network system made up of a simple building block – Facebook’s Wedge 100 switch.

“Unfortunately using a large, general purpose network chassis no longer met our needs in terms of scale, power efficiency, and flexibility,” the Facebook Engineering Team said in a blog post. “Taking a disaggregated approach allows us to accommodate larger regions and varied traffic patterns, while providing the flexibility to adapt to future growth.”

What does “adapt to future growth” mean? One obvious possibility is even larger data center campuses, with more buildings. But the most important benefit of the new architecture is that Fabric Aggregator provides flexibility, allowing it to manage campuses with different numbers of facilities in a logical fashion.

Flexible Enough for Many Scenarios

The aggregator was designed so it can work within a single rack, or in a multi-rack configurations. Different flavors of the rack were designed to support various networking technologies that tie together the many layers of data center traffic.

“The ability to tailor different Fabric Aggregator node sizes in different regions allows us to use resources more efficiently, while having no internal dependencies keeps failures isolated, improving the overall reliability of the system,” Facebook said.

The Fabric Aggregator uses Wedge100S switches with Facebook Open Switching System (FBOSS) as the base building blocks, running Border Gateway Protocol (BGP) between all subswitches

“A building block approach gives us the ability to operate the solution at either the subswitch or node level,” the Facebook team writes. “For example, if we detect a misbehaving subswitch inside a particular node, we can take that specific subswitch out of service for debugging. If there is a need take all downstream and upstream subswitches out of service in a node, our operational tools abstract all the underlying complexities inherent to multiple interactions across many individual subswitches. We also implement redundancy at the node level so that we can take many nodes out of service simultaneously in a single region. The Fabric Aggregator layer can suffer many simultaneous failures without compromising the overall performance of the network.”

For additional technical details, see the Facebook Engineering Post on Fabric Aggregator.

About the Author

Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

Sponsored Recommendations

How Deep Does Electrical Conduit Need to Be Buried?

In industrial and commercial settings conduit burial depth can impact system performance, maintenance requirements, and overall project costs.

Understanding Fiberglass Conduit: A Comprehensive Guide

RTRC (Reinforced Thermosetting Resin Conduit) is an electrical conduit material commonly used by industrial engineers and contractors.

NECA Manual of Labor Rates Chart

See how Champion Fiberglass compares to PVC, GRC and PVC-coated steel in installation.

Electrical Conduit Cost Savings: A Must-Have Guide for Engineers & Contractors

To help identify cost savings that don’t cut corners on quality, Champion Fiberglass developed a free resource for engineers and contractors.

Courtesy of ebm-papst Inc
Source: Courtesy of ebm-papst Inc

Importance of Resonance Detection and Avoidance in HVAC Systems

Joe Landrette, Senior Director of Ventilation/Air Conditioning, Data Center Markets, and Digital Solutions for ebm-papst Inc., explores the challenge of resonance in centrifugal...

White Papers

Get the full report

The Data Center Human Element: Designing for Observability, Resiliency and Better Operations

March 31, 2022
To meet the new demands being placed on data centers, industry leaders must rethink the way they approach their environment, delivery model and how they can leverage the cloud...