Facebook Express Network Speeds Traffic Between Data Centers

May 1, 2017
As data volumes have soared along with its growth, Facebook has built a dedicated network to manage the huge flows of machine-to-machine M2M traffic between its data centers.

Facebook delivers an enormous amount of data to its 2 billion users, including the photos and videos you share with your friends. It turns out that’s just the tip of the data iceberg, dwarfed by the massive flow of data between Facebook’s data centers.

As data volumes have soared, Facebook has decided to separate its data traffic, building a dedicated network to manage the huge data flows of machine-to-machine M2M traffic between its facilities. The company will continue to use its classic backbone (CBB) to deliver status updates and photos to its users. Traffic between data centers now travels across a new network called the Express Backbone (EBB).

The initiative, which was driven by the massive growth in video and photo data uploaded by Facebook users, will allow the company to fine tune the data flows across each network and avoid “traffic jams” that create congestion.

“In recent years, bandwidth demand for cross-data center replication of rich content like photos and video has been increasing rapidly, challenging the efficiency and speed of evolution of the classic backbone,” writes Facebook’s Mikel Jimenez and Henry Kwok in a blog post. “Furthermore, machine-to-machine traffic often occurs in large bursts that may interfere with and impact the regular user traffic, affecting our reliability goals. As new data centers were being built, we realized the need to split the cross-data center vs Internet-facing traffic into different networks and optimize them individually.”

The Express Backbone was built in less than a year, and will connect the company’s global network of data center campuses, including its U.S. data center campuses in Oregon, North Carolina, Iowa and Texas, and European facilities in Lulea, Sweden and Clonee, Ireland.

A diagram of the traffic volume across Facebook’s network, and the split between M2M and user traffic. (Source: Facebook)

With the Express Backbone, Facebook engineers sought to improve upon some of the technical constraints of the “classic” backbone network. Its approach, which is explained in detail in the technical blog post, provides new levels of control over the handling of different types of traffic. That includes a hybrid model for traffic engineering (TE), using both distributed control agents and a central controller.

“This model allows us to control some aspects of traffic flow centrally, e.g., running intelligent path computations,” the Facebook team writes. “At the same time, we still handle network failures in a distributed fashion, relying on in-band signaling between Open/R agents deployed on the network nodes.

“Such a hybrid approach allows the system to be nimble when it encounters congestion or failure. In EBB, the local agents can immediately begin redirecting traffic when they spot an issue (local response). The central system then has time to evaluate the new topology and come up with an optimum path allocation, without the urgent need to react rapidly.”

The result is a “path allocation algorithm” that can be changed to address the different requirements for each class of traffic. For example, it can:

  • Minimize latency for latency-sensitive traffic.
  • Minimize path congestion for latency-insensitive traffic.
  • Schedule latency-sensitive traffic ahead of latency-insensitive traffic.

“We’re very excited to have EBB running in production, but we are continuously looking to make further improvements to the system,” the Facebook engineers reported.

“For example, we’re extending the controller to provide a per-service bandwidth reservation system. This feature would make bandwidth allocation an explicit contract between the network and services, and would allow services to throttle their own traffic under congestion. In addition, we’re also working on a scheduler for large bulk transfers so that congestion can be avoided proactively, as opposed to managed reactively.”

About the Author

Rich Miller

I write about the places where the Internet lives, telling the story of data centers and the people who build them. I founded Data Center Knowledge, the data center industry's leading news site. Now I'm exploring the future of cloud computing at Data Center Frontier.

Sponsored Recommendations

Optimizing AI Infrastructure: The Critical Role of Liquid Cooling

In this executive brief, we discuss the growing need for liquid cooling in data centers due to the increasing power demands of AI and high-performance computing. Discover how ...

AI-Driven Data Centers: Revolutionizing Decarbonization Strategies

AI hype has put data centers in the spotlight, sparking concerns over energy use—but they’re also key to a greener future. With renewable power and cutting-edge cooling, data ...

Bending the Energy Curve: Decoupling Digitalization Trends from Data Center Energy Growth

After a decade of stability, data center energy consumption is now set to surge—but can we change the trajectory? Discover how small efficiency gains could cut energy growth by...

AI Reference Designs to Enable Adoption: A Collaboration Between Schneider Electric and NVIDIA

Traditional data center power, cooling, and racks aren’t sufficient for GPU-based servers arranged in high-density AI clusters...

Courtesy of AMI
Image courtesy of AMI

The Next Era of AI Data Centers: Why Device-Level Management Matters

Rami Radi, Senior Product Manager and Solution Architect at AMI, explains why a non-intrusive, IT-centric and heterogeneous approach to data center infrastructure management is...

White Papers

Dcf Service Express Wp Cover 2021 12 17 9 16 03 232x300

2022 Data Center & Infrastructure Report

Dec. 20, 2021
Service Express reveals the results of their survey of 700 US IT professionals in the 2022 Data Center & Infrastructure Report. Key findings reveal the continued need for strengthening...