How AI Is Affecting Data Center Networks

March 5, 2024
Following up on the contest between Infiniband and Ethernet technologies, as they jockey for position in present and future data center networks.

In the contest between Infiniband and Ethernet technologies for primacy in data center networks now and in the future, Ethernet last week may have nosed up by about a terabit, when Synopsys debuted what it says is the industry's first 1.6T Ethernet IP [intellectual property] core, geared to meet the high bandwidth needs of AI and hyperscale data center chips.

In the gathering era of data-intensive AI workloads, hyperscale data center backbones require high-bandwidth, low-latency chips and interfaces to process petabytes of data quickly. Synopsys says its new 1.6T Ethernet IP offering enables design teams to create faster chips for greater AI and data center networking bandwidth.

For the data center industry, Synopsys asserts how forthcoming multi-channel/multi-rate Ethernet controllers will offer networks 1.6T support with up to 40% latency reduction and up to 50% area reduction compared to existing multi-rate 800G IP solutions. To accelerate AI data centers, the platform's 224G Ethernet PHY IP is customizable to support chip-to-chip, chip-to-module, and copper cable connections optimizing power and performance tradeoffs.

"The massive compute demands of hyperscale data centers require significantly faster Ethernet speeds to enable emerging AI workloads," said John Koeter, senior vice president of marketing and strategy for IP, Synopsys. "Our complete IP solution for 1.6T Ethernet, pre-verified subsystems, successful ecosystem interoperability, and decades of expertise in developing and delivering the industry's broadest interface IP portfolio allow designers to confidently integrate the necessary functionality into their SoCs [systems on chips] with less risk."

Via use of Ethernet's extensive, interoperable, proven IP, Synopsys contends it is enabling hyperscale data center providers and the ecosystem that serves them to future-proof their infrastructure via their silicon roadmap.  

"With growing demands from large language modeling, HPC simulation, and AI training in hyperscale data centers, network boundaries are crossing over the Terabits per second threshold," notes Peter Jones, chairman, Ethernet Alliance. "The availability of development tools capable of meeting these needs is critical to the success of next-generation Ethernet standards addressing this market."

Commenting on Synopsys' release of IP, in a new article, Forbes Contributor Karl Freund, Founder and Principal Analyst of Cambrian-AI Research LLC, pointed out how, as an AI industry analyst, he would not ordinarily "take notice of a new version of Ethernet; its IP is fairly staid technology these days." He continued, "But now the demands of high performance AI has changed the game again." 

Freund pointed out how, as generative AI has pushed the performance boundaries of computation, memory, and networking, CPU, accelerator, and switch vendors depend on firms such as Synopsys to provide the technology-certified IP blocks for MACs, PHYs and verification tools. 

"Some vendors like NVIDIA and AMD have developed their own networking between accelerators, and others like Intel use industry-standard Ethernet for simplicity at scale," noted Freund. "Now Synopsys is enabling standard Ethernet networking to run at the speeds needed for AI with the first IP offering that delivers twice the bandwidth of today’s Ethernet, at half the power, to meet the growing demand for scaling AI."

While the Ethernet 1.6T standard is not expected to be published as finalized for a couple of years, the Synopsys IP and verification tools are available now for data center SoC designers to start building upon. "Since 60% of data center infrastructure power involves Ethernet connections, this solution should find significant adoption for AI-centric workloads," predicted Cambrian-AI Research's Freund.

 

Data Center AI Back End Networks Will Require Infrastructure Buildout 

As recently noted by Data Center Frontier's sibling publication Lightwave, a chronicler of the optical communications sphere, according to Dell’Oro Group, spending on switches deployed in AI back-end networks is forecast to expand the data center switch market by 50 percent. While current data center switch market spending is for front-end networks primarily connecting general-purpose servers, AI workloads will require a new back-end infrastructure buildout. 

“Generative AI applications usher in a new era in the age of AI, standing out for the sheer number of parameters they have to deal with,” said Sameh Boujelbene, VP at Dell’Oro Group. “Several large AI applications currently handle trillions of parameters, with this count increasing tenfold annually. This rapid growth necessitates the deployment of thousands or even hundreds of thousands of accelerated nodes,” she added. "Connecting these accelerated nodes in large clusters requires a data center-scale fabric, known as the AI back-end network, which differs from the traditional front-end network used mostly to connect general-purpose servers.” 

As AI networks accelerate the transition to higher speeds, Dell'Oro notes that growing bandwidth requirements will drive the need for optical 800G transceivers inside data centers. In its 2022 technology roadmap, the Ethernet Alliance forecast speeds of 800 Gbps and 1.6 Tbps to become IEEE standards between 2023 and 2025. Dell’Oro said 800 Gbps will comprise most AI back-end network ports through 2025, and that competition between InfiniBand and Ethernet is intensifying as manufacturers vie for market share in AI back-end networks. 

When it developed its forecast, Dell’Oro looked at the AI back-end network buildouts by the major cloud service providers including Google, Amazon, Microsoft, Meta, Alibaba, Tencent, ByteDance, Baidu, and others. It also looked at considerations driving their choices of the back-end fabric.

InfiniBand currently offers speeds up to 200 Gbps and beyond, which benefits AI workloads involving massive data transfers. However, the analyst notes that modern Ethernet technologies such as 800 Gbps interfaces, which InfiniBand will only support for up to two years, provide substantial bandwidth, meeting the requirements of most AI applications. 

While InfiniBand is expected to maintain its lead, Dell’Oro forecasts Ethernet to make substantial gains, up to 20 revenue-share points by 2027. “One could argue that Ethernet is a one-speed generation ahead of InfiniBand," Boujelbene said. "Network speed, however, is not the only factor. Congestion control and adaptive routing mechanisms are also important.”

 

A Second Optical Network for AI

Surveying the same data center network landscape, Light Reading's Mike Dano recently pointed out how:

"According to the financial analysts at Raymond James, the ongoing AI boom will create a 'bandwidth opportunity' inside data centers worth up to $6.2 billion in sales by 2027 [...] that's music to the ears of companies like Corning, Coherent, Lumentum and others that also play in the wider global telecom industry ... For example, [fiber cabling supplier] Corning CEO Wendell Weeks suggested big data center operators are going to need to build a 'second optical network' in order to connect all the GPUs that underpin the development of artificial intelligence [...] 'Overall orders grew in the fourth quarter and we're seeing the earliest edge of AI-related network builds in our order books,' Weeks said during Corning's quarterly conference call this week, according to Seeking Alpha."

Fleshing out this concept of a "second optical network," Mike O’Day, Vice President of Corning Optical Communications' Technology and Program Management Office (PMO), explained in a recent blog:

"Typically, hyperscale data center operators build a campus and interconnect it with high-fiber-count single-mode cables. These cables often contain more than 3,000 fibers. Then, this fiber comes inside the data center and routes to spine and leaf switches that are meshed together, creating an incredible number of optical links and connections to traditional processors (CPUs) located in server racks throughout the data center. When done correctly, these networks enable familiar uses like streaming movies or surfing on social platforms. Now, to enable an AI data center network, clusters of powerful servers with many GPUs are needed that require a high number of connections. This AI cluster is then connected back to the primary network and routed appropriately. 

[...] The amount of electrical power to fire up the AI server clusters is significantly more than on the front end for the comparable number of servers. We’re seeing leading hyperscales designing and beginning to build data centers with this second optical AI network, which is increasing optical connections by up to five times within data centers. As our teams here at Corning work with our hyperscale data center customers to meet the requirements of an AI data center, we’re focusing on innovation guided by the '4S' principles: speed, simplicity, size, and sustainability. Those key vectors are as important in the data center space as they are in the access network."

Infiniband In the Rearview Mirror May Be Larger Than It Appears

In assessing present Infiniband vs. Ethernet technology stakes in the data center, we thought it might be useful to to see how the world's hottest technology company is faring. To wit, in NVIDIA's Q4 2023 Earnings Conference Call, the company noted how its record fourth quarter data center revenue of $18.4 billion, up 27% sequentially and up 409% year-on-year, was driven by the company's Hopper GPU computing platform, along with InfiniBand end-to-end networking. 

"Compute revenue grew more than 5x and networking revenue tripled from last year," said NVIDIA CFO Collette Kress. "We are delighted that supply of Hopper architecture products is improving. Demand for Hopper remains very strong. We expect our next-generation products to be supply-constrained as demand far exceeds supply."

Kress emphasized, "From a product perspective, the vast majority of revenue was driven by our Hopper architecture, along with InfiniBand networking. Together, they have emerged as the de-facto standard for accelerated computing and AI infrastructure."

However, while NVIDIA Quantum InfiniBand is the standard for the company's highest performance AI-dedicated infrastructures, during the conference call, Kress revealed that the company is now entering the Ethernet networking space with the launch of its new Spectrum-X end-to-end offering.

Designed for an AI-optimized networking for the data center, Spectrum-X introduces new technologies over Ethernet that are purpose-built for AI. "Technologies incorporated in our Spectrum switch, BlueField DPU and software stack deliver 1.6x higher networking performance for AI processing compared with traditional Ethernet," added NVIDIA's Kress.

Leading OEMs and their global sales channels, including Dell, HPE, Lenovo, and Super Micro, are partnering with NVIDIA to expand our AI solution to enterprises worldwide and are reportedly on track to ship the Spectrum-X technology this quarter.

NVIDIA founder and CEO Jensen Huang concluded:

"InfiniBand is the standard for AI-dedicated systems. Ethernet is ... not a very good scale-out system, but with Spectrum-X, we’ve augmented, layered on top of Ethernet, fundamental new capabilities like adaptive routing, congestion control, noise isolation or traffic isolation, so that we could optimize Ethernet for AI. And so InfiniBand will be our AI-dedicated infrastructure; Spectrum-X will be our AI- optimized networking -- and that is ramping."

 

Keep pace with the fast-moving world of data centers and cloud computing by connecting with Data Center Frontier on LinkedIn, following us on X/Twitter and Facebook, and signing up for our weekly newsletters using the form below.

About the Author

Matt Vincent

A B2B technology journalist and editor with more than two decades of experience, Matt Vincent is Editor in Chief of Data Center Frontier.

Sponsored Recommendations

Guide to Environmental Sustainability Metrics for Data Centers

Unlock the power of Environmental, Social, and Governance (ESG) reporting in the data center industry with our comprehensive guide, proposing 28 key metrics across five categories...

The AI Disruption: Challenges and Guidance for Data Center Design

From large training clusters to small edge inference servers, AI is becoming a larger percentage of data center workloads. Learn more.

A better approach to boost data center capacity – Supply capacity agreements

Explore a transformative approach to data center capacity planning with insights on supply capacity agreements, addressing the impact of COVID-19, the AI race, and the evolving...

How Modernizing Aging Data Center Infrastructure Improves Sustainability

Explore the path to improved sustainability in data centers by modernizing aging infrastructure, uncovering challenges, three effective approaches, and specific examples outlined...

Image courtesy of EXFO

Navigating the Future: Upgrading Networks in Data Centers for 400G  

Nicholas Cole, Data Center Solution Manager at EXFO, explains why the journey towards 400G and beyond is not merely about keeping pace but also ensuring that every step forward...

White Papers

N Vent Thumbnail

Liquid cooling is in your future

June 26, 2023
The data center environment is changing as densities increase and operating temperature rise. These fundamental shifts in the way data centers operate is driving the need for ...