NVIDIA GTC 2025 Unveils Revolutionary Chips, Systems, and Optical Networking for Hyperscale AI Data Centers

April 8, 2025
NVIDIA’s GTC 2025 spotlighted a sweeping upgrade to its AI infrastructure stack, unveiling new CPU and GPU architectures built for extreme performance and efficiency. The debut of the NVL576 rack, powered by Vera Rubin Ultra SuperChips, marks a bold step forward in high-density, liquid-cooled AI systems.

Last month's NVIDIA GTC 2025 saw the expected announcement of next-generation products in the AI GPU/CPU market that Nvidia has been dominating, if not defining, for the last few years.

Interestingly enough, the message for these future generations of product was focused not just on “bigger and better,” but also, “do more and work with your existing hardware investment” - not that future improvements in performance, scalability, and energy efficiency weren’t addressed.

Changes to the Core Blackwell GPU Architecture

At GTC 2024, Nvidia introduced their first generation Blackwell GPU architecture designed for AI. The revolutionary GPU offered breakthroughs in accelerated computing, AI inference, ray tracing, and neural rendering, and was suitable for AI models with as many as a trillion parameters.

At GTC 2025, the Blackwell follow-on was announced, the Blackwell Ultra Architecture.  This new architecture adds performance, efficiency, and security to the existing Blackwell model, including:

Enhanced Tensor Cores: Built upon the original Blackwell architecture, Blackwell Ultra introduces Tensor Cores with twice the attention-layer acceleration and 1.5 times more AI compute FLOPS. These features are particularly suitable for handling LLMs and other complex AI operations enhancing the GPUs abilities in accelerating deep learning tasks,  by performing mixed-precision calculations, which allows for faster training and inference of neural networks.

Transformer Engine: The architecture incorporates the NVIDIA Blackwell Transformer Engine, utilizing micro-tensor scaling techniques, enabling enhanced support for 4-bit floating point (FP4) and adding 6-bit floating point (FP6) precision, effectively doubling the performance and capacity for next-generation models while maintaining high accuracy that is sufficient for AI training. ​

Enhanced Security: The Blackwell Ultra architecture introduces NVIDIA Confidential Computing, which offers hardware-based security to protect sensitive data and AI models from unauthorized access. It will be the first GPU to feature trusted I/O virtualization, which is an architecture for securely connecting devices to a Trusted Execution Environment (TEE-I/O), ensuring secure data transmission with minimal performance overhead. ​Confidential computing is enabled by virtue of the ability to have a secure environment where critical tasks and data can be isolated.

Memory Size: The expectation is that Blackwell Ultra chips will be equipped with 288 GB of HBM3e/HBM4 memory across eight stacks, allowing the GPUs to handle substantially larger models when compared to the first generation GPU and its 192 GB of HBM3e memory

Performance: The new chips will deliver up to 30 petaflops of AI performance (the expectation is 25-30 PFLOPS), targeting demanding AI applications.

These ultra-high performance GPUs will be suitable for data centers, advanced inference, and training workloads that don’t require the deployment of Nvidia’s SuperChip model, which combines multiple GPUs with ARM CPU cores as exemplified by Nvidia’s GH100 and GB200 SuperChip products. Nvidia expects to start delivering the Blackwell Ultra parts in 2025.

Nvidia's Vera Rubin Platform for Next-Gen AI

At last year’s Data Center Frontier Trends Summit, a number of presenters brought up the imminent possibility of hardware racks that would require a megawatt of power and cooling. While these discussions were received ath a certain amount of humor, it is safe to say that the future is now. At the 2025 GTC event, Nvidia introduced its NVL576 data center rack, which is designed to use up to 600 kW of power, a five fold increase over the existing 120 kW NVL72 rack system. Like the previous generation rack system, the NVL576 requires liquid cooling.

The rack is designed to house up to 576 of Nvidia's next generation Vera Rubin Ultra SuperChips. Each chip consists of 88 Vera CPU cores, with each ARM core being dual-threaded, and a Rubin GPU, which is the successor GPU to the Blackwell. Interconnection bandwidth is also doubled, compared to the GB200, buy using the next generation NVLink-C2C, offering 1.8 TB/s bandwidth compared to the GB200 900 Gb/s connection.

The Vera Rubin SuperChip is designed for FP4 precision AI workloads (even more compact than FP8), leverages HBM4 for ultra-high-bandwidth memory (up to 2.5 TB/s), and is significantly tly more powerful offering up to 50 PFLOPS FP4 compute.

According to preliminary information, the new multichip package will no longer support 16 or 32 bit floating point precision, being optimized for LLM training/inference and future AI factories which doesn’t require the high precision compute. The performance is targeted at doubling the first generation Grace Hopper performance (Grace ARM CPUs and Hopper H100 GPUs) with a goal to deliver high efficiency, exascale-class AI computing.

As the product name implies, each NVL576 rack is liquid cooled and houses 576 Rubin Ultra GPUs, organized into four pods. Each pod contains 18 blades, with each blade supporting up to eight Rubin Ultra GPUs. The new rack uses the Kyber rack design, which incorporates compute blades rotated 90 degrees for increased density. 

A fully populated rack is expected to deliver up to 15 exaflops of FP4 inference performance, marking a 14-fold increase over earlier systems. The new racks also use next generation NVLink modules each with three next-generation NVLink connections. This results in better data transfer rates with reduced latency.

The Future of 1 MW Racks

Nvidia's Rubin Ultra NVL576 rack is part of the roadmap to meet the escalating computational requirements of future AI applications. Its design reflects a focus on scalability, efficiency, and performance, positioning it as a cornerstone for next-generation AI data centers and providing a certain degree of future-proofing for data centers as we move towards the inevitable 1 MW computational rack deployments.

While we can expect future generations of CPUs, GPUs, and racks to continuially increase performance, realistically, there needs to be significant focus on better bang for the buck. Given hardware and ower costs, things such as zombie servers have to remain firmly in the past. From more efficient software stacks to hardware with reduced power consumption or significantly improved performance delivery per watt of consumed energy, the tipping point for data centers, power, and efficiency is not far in the future.

As we see these next generation CPU/GPU products taking on more specific focuses, it looks like the future will have a significant component of purpose designed features, that is everything form system hardware to entire data centers built to deliver optimum performance and efficiency while fulfilling specific business roles and demands.  We do live in interesting times. Let’s hope it’s not a curse.

NVIDIA's CPO Technology: Enhancing Performance and Efficiency in AI Data Centers

Significantly at the GTC conference, Nvidia also announced its co-packaged optics (CPO) technology, which they believe is a key component for the next generation of AI factories that the company believes is the future of AI data centers. 

Nvidia's CPO technology, as positioned at GTC, is set to transform AI data centers with terabit-scale connectivity, enhanced energy efficiency, and simplified deployment, marking a significant advancement in AI infrastructure.

The integrated silicon photonics live on the same ASICs that enable network connectivity and bring significant benefits in performance, efficiency, and deployment time.

Silicon Photonics Nuances

If this is your first exposure to silicon photonics and co-packaged optics it is a straightforward concept. Optical components, like lasers and photonic integrated circuits, are physically integrated next to or within the same package as the electronic components such as switching chips or processor). This close integration allows for faster, more energy-efficient data transmission within high-performance systems like AI supercomputers and AI data centers.

AI models (e.g., GPT, LLaMA) require extreme interconnect bandwidth between thousands of GPUs/TPUs. CPO supports terabit-scale connectivity per chip, making it easier to scale model training infrastructure. Nvidia is expecting this first-generation product set to deliver 1.6 Tb per port switching with 3.5X the energy efficiency compared to current pluggable interconnects. An improvement of 10X in network resilience compared to pluggable transceivers is also expected.

The CPO also eliminates the cost of adding pluggable transceivers, simplifying the bill of materials for an AI factory deployment and lowering the TCO. Nvidia also expects the switch to CPO to simplify the deployment of the data center network, resulting in the AI factory to begin generating insights 1.3X faster than the current pluggable transceiver model.

Jensen Huang, founder and CEO of Nvidia, identified CPO as a critical part of the AI factory model, saying:

AI factories are a new class of data centers with extreme scale, and networking infrastructure must be reinvented to keep pace. By integrating silicon photonics directly into switches, NVIDIA is shattering the old limitations of hyperscale and enterprise networks and opening the gate to million-GPU AI factories.

CPO Upside for AI

By eliminating traditional pluggable transceivers, additional benefits can be achieved—not only a reduction in signal latency, which is critical for AI training workloads, but also improved signal integrity. CPO shortens the electrical path between switch ASICs and optical interfaces, minimizing signal degradation and enabling higher bandwidths at lower power budgets. This tighter integration reduces the need for high-power SerDes and re-timers, which are common in pluggable architectures, contributing further to energy efficiency. 

In addition, the simplified system architecture enabled by CPO supports more compact and streamlined designs, improving airflow dynamics and easing the implementation of direct-to-chip liquid cooling systems. As liquid cooling becomes standard in hyperscale and AI factory environments, these high-density, liquid-cooled racks will operate with lower thermal loads and greater reliability, enabling sustained performance at scale with reduced cooling overhead.

TSMC Weighs In

C. C. Wei, chairman and CEO of Taiwan Semiconductor Manufacturing Company (TSMC), one of Nvidia’s partners in CPO development, is also clear on the technology's importance to future AI deployments, saying:

A new wave of AI factories requires efficiency and minimal maintenance to achieve the scale required for next-generation workloads. TSMC’s silicon photonics solution combines our strengths in both cutting-edge chip manufacturing and TSMC-SoIC 3D chip stacking to help NVIDIA unlock an AI factory’s ability to scale to a million GPUs and beyond, pushing the boundaries of AI.

It should be noted here that TSMC's recent commitment to invest an additional $100 billion in U.S. semiconductor manufacturing marks a pivotal moment for the data center industry, particularly for companies like Nvidia that rely heavily on advanced chip production. For Nvidia, this development offers significant strategic advantages. By localizing chip production within the United States, the company can mitigate risks associated with international supply chain disruptions and geopolitical tensions. This proximity not only ensures a more stable supply of critical components but also aligns with Nvidia's broader strategy to invest hundreds of billions in U.S. manufacturing over the next four years.

The backdrop to TSMC's substantial U.S. investment includes the Trump administration's implementation of tariffs aimed at encouraging domestic manufacturing. The tariffs, if fully implemented, will have a pronounced impact on the technology sector, increasing costs for companies that rely on imported components. By establishing manufacturing facilities in Arizona, TSMC and its clients, such as Nvidia, can circumvent the tariffs, leading to more predictable production costs and pricing structures.

CPO Focus at GTC

In his GTC keynote, NVIDIA CEO Huang illustrated the staggering inefficiencies of today’s network interconnects by holding up a single pluggable fiber transceiver. Each unit, he noted, consumes approximately 30 watts of power and costs around $1,000 when purchased at scale.

Extrapolated to the scale of modern AI infrastructure, the numbers become daunting. In a reference architecture using a multi-layer switch fabric for 250,000 GPUs—representative of large-scale AI factories—each GPU would require six transceivers, totaling 180 watts per GPU just for optical I/O. That translates into 45 megawatts of power consumed exclusively by the optical transceivers. Scale that to a million GPUs, and the power demand balloons to 180 megawatts—just to move data.

Huang used this example to emphasize a core challenge facing hyperscale AI: traditional network architectures cannot scale efficiently alongside GPU density and power requirements. “We have to reinvent the network,” he said, teeing up one of the keynote’s major announcements—NVIDIA’s push into CPO designs. As noted above, by integrating optical interfaces directly alongside the switch silicon within a single package, CPO eliminates the need for high-power, pluggable transceivers and their associated SerDes, significantly reducing both latency and power consumption.

The implications for AI data centers are profound. CPO not only reduces the power overhead for interconnects—potentially saving tens or even hundreds of megawatts at scale—but also enables denser, more thermally efficient system designs. This is especially critical as AI factories move toward liquid-cooled racks and tighter integration of compute and networking.

Huang’s remarks, starting around the 1:35:00 mark in the keynote, underline a strategic inflection point: the transition from legacy optics to co-packaged designs is not just a performance upgrade, but a necessity for the sustainable scaling of AI infrastructure.

NVIDIA Silicon Photonics Ecosystem

Nvidia's announced silicon photonics ecosystem includes well known players in optical connectivity and chip development, including

Two different families of CPO switches will become available later in 2025 from Nvidia, the NVIDIA Spectrum-X Photonics Ethernet and NVIDIA Quantum-X Photonics InfiniBand platforms.

Though not yet available, Nvidia is adding CPO to its Ethernet switches with Spectrum-X. Two models have been publicly discussed. The first has 128 ports of 800 Gb/s or 512 ports of 200 Gb/s, delivering 100 Tb/s total bandwidth; and a second with as 512 ports of 800 Gb/s or 2,048 ports of 200 Gb/s, for a total throughput of 400 Tb/s.

Nvidia is also expanding its Quantum-X800 platform to include CPO-based switches.  The first announced switch is the Q3450-LD which offers 144 ports of 800 Gb/s InfiniBand. This switch uses liquid cooling for 85% of the heat generation and provides cost effective and efficient cooling for the silicon photonics components. The switch can also scale to support over 10,000 GPUs in a non-blocking two-level fat-tree topology at 800 Gb/s.

According to Nvidia this configuration offers twice the speed and five times the scalability of their previous generation of InfiniBand switches.  

This is Nvidia’s first CPO product, and allows a single-mode optical fiber cable to connect directly to the package, eliminating  both the pluggable transceiver and the need to DSP retimers, as well as the power and latency penalties using the additional hardware entails. The reduced number of components is a large part of the reason for the improved resiliency of the system.

Redefining Hyperscale

NVIDIA's GTC 2025 served as a clarion call for the next era of data center design—one defined not just by faster chips, but by sweeping innovation across the entire stack, from networking and systems architecture to software and power efficiency. With breakthroughs like co-packaged optics, the next-gen Grace Blackwell superchips, and a vision for AI factories that scale to millions of GPUs, NVIDIA is setting the pace for what hyperscale infrastructure will require in the years ahead.

For the U.S. data center sector, these announcements underscore a growing urgency to modernize facilities for unprecedented density, power, and thermal loads. At the same time, NVIDIA’s investments in domestic manufacturing and partnerships with ecosystem players signal a pivot toward greater supply chain resilience—a theme increasingly echoed across the globe. For operators, developers, and utilities, the message is clear: the future will demand more integrated planning, more efficient interconnects, and infrastructure that is not just bigger, but fundamentally smarter.

 

Keep pace with the fast-moving world of data centers and cloud computing by connecting with Data Center Frontier on LinkedIn, following us on X/Twitter and Facebook, as well as on BlueSky, and signing up for our weekly newsletters using the form below.

About the Author

David Chernicoff

David Chernicoff is an experienced technologist and editorial content creator with the ability to see the connections between technology and business while figuring out how to get the most from both and to explain the needs of business to IT and IT to business.
About the Author

Matt Vincent

A B2B technology journalist and editor with more than two decades of experience, Matt Vincent is Editor in Chief of Data Center Frontier.

Sponsored Recommendations

In this executive brief, we discuss the growing need for liquid cooling in data centers due to the increasing power demands of AI and high-performance computing. Discover how ...
AI hype has put data centers in the spotlight, sparking concerns over energy use—but they’re also key to a greener future. With renewable power and cutting-edge cooling, data ...
After a decade of stability, data center energy consumption is now set to surge—but can we change the trajectory? Discover how small efficiency gains could cut energy growth by...
Traditional data center power, cooling, and racks aren’t sufficient for GPU-based servers arranged in high-density AI clusters...

Adobe Stock, courtesy of Radix IoT
Source: Adobe Stock, courtesy of Radix IoT
Michael Skurla, co-founder of Radix IoT, explains how active, intelligent monitoring can extend data center lifespans while unlocking hidden capacity for AI innovation.

White Papers

Dcf Opus Wp 2022 07 22 8 28 46 233x300
July 22, 2022
Opus Interactive outlines popular tools that can help you make informed decisions about how to best allocate AWS resources and reduce costs.