Shaping the Future of AI Data Centers via Connectivity and Cabling

Dec. 23, 2024
Recalling how announcements by Enfabrica and Point2 Technology at SC24 mark a significant milestone in the evolution of AI infrastructure.

At Supercomputing 2024 (SC24) in Atlanta, two announcements might have set a new course for the future of AI data centers.

Enfabrica Corporation, a pioneer in high-performance networking silicon, revealed the general availability of its 3.2 Terabit/sec (Tbps) Accelerated Compute Fabric (ACF) SuperNIC chip, which promises advancements in GPU networking capabilities.

Meanwhile, Point2 Technology, known for its energy-efficient interconnect solutions, introduced its P1B121 UltraWire Smart Retimer SoC, designed to address the increasing energy  demands of AI-driven hyperscale data centers.

With AI, Connectivity Is Crucial

With AI supercomputers typically being composed of thousands, sometimes millions, of interconnected processing units any technologies that impact those interconnections can have a significant impact.

High-speed, low-latency internal networks, such as InfiniBand or high-performance Ethernet (currently 400/800 Gbe), are critical to facilitating this communication. Without such connectivity, the synchronization of tasks across processing units suffers, resulting in inefficiencies and reduced computational performance.

In AI research and development, many supercomputers are part of global collaborations. Shared datasets, federated AI model training, and distributed simulations necessitate secure and ultra-reliable network connectivity across international boundaries.

Any interruptions or inefficiencies in these connections can disrupt the research process, slowing down potential breakthroughs due to connectivity issues.

Speed, Speed, and More Speed

Enfabrica's 3.2 Tbps ACF SuperNIC looks to accelerate AI data center technology, delivering four times the bandwidth compared to the current standard of GPU-attached network interface card.  

Built from the ground up to cater to the needs of large-scale AI models, such as training, inference, and retrieval-augmented generation (RAG) workloads. Enfabrica identifies 3 key features that make their NIC a strong contender as an AI connectivity choice: 

  • Bandwidth and scalability
    • Each ACF-S chip includes 32 network ports and 160 PCIe lanes, enabling GPU clusters of over 500,000 units.
    • The solution supports 800, 400, and 100 Gigabit Ethernet interfaces, offering unprecedented scale-out throughput.
  • Software defined networking adds resiliency and flexibility
    • With Resilient Message Multipathing (RMM) technology, the minimizes job stalls due to network failures, boosting GPU utilization without altering existing AI software stacks.
    • The chip's Software-Defined RDMA Networking enhances flexibility, allowing operators to customize transport layers for optimized cloud-scale networks.
  •   Simplifies operation
    • By integrating zero-copy data transfers and optimized memory management, to enhance floating-point operations per second (FLOPs) utilization, critical for large-scale AI clusters.

 

Enfabrica CEO Rochan Sankar described the launch as a pivotal moment:

Today is a watershed moment for Enfabrica. We successfully closed a major Series C fundraise and our ACF SuperNIC silicon will be available for customer consumption and ramp in early 2025. With a software and hardware co-design approach from day one, our purpose has been to build category-defining AI networking silicon that our customers love, to the delight of system architects and software engineers alike. These are the people responsible for designing, deploying and efficiently maintaining AI compute clusters at scale, and who will decide the future direction of AI infrastructure.

It's Not Just the NIC

While connectivity is key, efficiently managing the power used is important in an area that is dealing with ever increasing energy demand.

Point2 Technology unveiled its P1B121 UltraWire Smart Retimer SoC focused directly on this concern, providing a solution that is tailored for Active Electrical Cables (AECs) in hyperscale AI/ML data centers, providing ultra-low power consumption and reduced latency. Key features of the SOC include:

Improved power utilization

  • Consuming just 3.0 watts per chip, the P1B121 achieves 2X lower power usage compared to other solutions.
  • The chip’s architecture allows for smaller copper wire gauges, reducing cable thickness and weight while improving cooling efficiency.

Significant latency reduction

  • At 3 nanoseconds, the P1B121 offers 20X lower latency than conventional PAM4 implementations, a factor for real-time AI applications.

Flexibility and support for future growth

  • The SoC supports both 800G and 1.6T AEC configurations, ensuring compatibility with current and future AI workloads.
  • Its BER-aware architecture dynamically balances power consumption and signal integrity, enabling cost-effective scaling.

Sean Park, CEO of Point2 Technology, highlighted the chip’s role in bridging performance and sustainability:

At Point2, we continue to push the limits on power efficiency and sustainability inside the datacenter. We recognize the challenges data centers face in balancing high-performance and low-latency operations with the need for energy efficiency. The Point2 P1B121 Smart Retimer SoC is a key solution in addressing those challenges directly and giving datacenters the ability to not only address current 800G data interconnect speeds but prepare for future 1.6T workloads as well.

Impact on the Future of AI Data Centers: A Fundamental Shift

Enfabrica's ACF SuperNIC and Point2's P1B121 Smart Retimer are complementary technologies that address different facets of AI data center operations.

While Enfabrica focuses on high-bandwidth, resilient GPU interconnectivity, Point2 targets the energy efficiency and latency of cable connections. These technologies can be deployed to work together to create an ecosystem capable of supporting demanding AI workloads.

With AI expected to account for nearly 20% of data center electricity consumption by 2030, sustainability is no longer optional—it is imperative. Both Enfabrica and Point2 are taking proactive steps to mitigate the environmental impact of AI data centers. From energy-efficient chips to scalable network designs, their solutions pave the way for a greener future.

Notably, the ripple effects of these innovations extend beyond AI data centers. Point2's advancements in lightweight, low-power cables have applications in automotive networks, as evidenced by some of their announced partnerships, where similar challenges of bandwidth and efficiency arise. While Enfabrica's programmable networking solutions aren’t limited to AI solutions but can provide flexibility in  how cloud providers manage heterogeneous compute clusters.

Both companies understand the importance of ecosystem collaboration. Enfabrica’s involvement with the Ultra Ethernet Consortium and Ultra Accelerator Link Consortium can ensure its solutions align with industry standards.

Meanwhile, Point2’s partnerships with Molex and Bosch Ventures highlight the growing convergence between data center and other technologies such as industrial and security. These developments represent a fundamental shift in how data centers are designed and operated, ensuring that the growth of AI can be sustained in an energy-conscious manner.  

With commercial availability for many cutting-edge products, from those discussed here, to core pieces like the Nvidia GB200 GPUs slated for 2025, the next 12 to 24 months look to be transformative, changing the infrastructure of hyperscale data centers and AI hardware and applications.

 

Keep pace with the fast-moving world of data centers and cloud computing by connecting with Data Center Frontier on LinkedIn, following us on X/Twitter and Facebook, and signing up for our weekly newsletters using the form below.

About the Author

David Chernicoff

David Chernicoff is an experienced technologist and editorial content creator with the ability to see the connections between technology and business while figuring out how to get the most from both and to explain the needs of business to IT and IT to business.

Sponsored Recommendations

Tackling Utility Project Challenges with Fiberglass Conduit Elbows

Explore how fiberglass conduit elbows tackle utility project challenges like high costs, complex installations, and cable damage. Discover the benefits of durable, cost-efficient...

How Deep Does Electrical Conduit Need to Be Buried?

In industrial and commercial settings conduit burial depth can impact system performance, maintenance requirements, and overall project costs.

Understanding Fiberglass Conduit: A Comprehensive Guide

RTRC (Reinforced Thermosetting Resin Conduit) is an electrical conduit material commonly used by industrial engineers and contractors.

NECA Manual of Labor Rates Chart

See how Champion Fiberglass compares to PVC, GRC and PVC-coated steel in installation.

DALL·E, courtesy of EdgeConneX
Source: DALL·E, courtesy of EdgeConneX

Breaking Barriers in Rack Density: Why Liquid Cooling is the Key to Tomorrow's Data Centers

Phillip Marangella, Chief Marketing and Product Officer for EdgeConneX, explains why the adage "everything old is new again" only tells part of the story – especially when it ...

White Papers

Get the Full Report

Using Simulation to Validate Cooling Design

April 21, 2022
Kao Data’s UK data center is designed to sustainably support high performance computing and intensive artificial intelligence. Future Facilities explores how CFD can validated...