Technical computing heavyweight NVIDIA has played a pivotal role in accelerating the adoption of artificial intelligence (AI), providing powerful data-crunching hardware to bring AI’s potential into a practical reality. This week NVIDIA unveiled a new generation of its GPU (graphics processing unit) technology designed to further shake up the world of AI computing.
NVIDIA’s announcements at Thursday’s Virtual GTC event heralds the coming arrival of lots of power-hungry GPU hardware in data centers, offering new ways to harness AI in business and research. The new hardware announced at GTC –
That includes 14 data center operators who are part of NVIDIA’s “DGX-Ready” program to host customers using the latest version of the GPU-powered “supercomputer in a box.” They will be on the leading edge of a movement that will bring more high-density AI hardware into the data centers, including the latest offerings from NVIDIA and Intel as well as new chips from a group of ambitious semiconductor startups.
This AI arms race is one of DCF’s Eight Trends That Will Shape the Data Center in 2020.
“AI is a hardware-intensive computing technology that will analyze data both near and far,” we noted in our 2020 forecast. “That includes everything from algorithm training at cloud campuses to inference engines running on smartphones. AI can make products and services smarter. Every business yearns for that, which is why AI is emerging as a strategic priority.”
Power, Versatility and Lots of Density
NVIDIA says its DGX A100 system consolidates the power and capabilities of an entire data center into a single flexible platform. DGX A100 systems integrate eight of the new NVIDIA A100 Tensor Core GPUs, providing 320GB of memory for training AI datasets, as well as high-speed 200Gbps interconnects from Mellanox.
DGX A100 systems are available now and have begun shipping worldwide, with the first order going to the U.S. Department of Energy’s (DOE) Argonne National Laboratory, which will use the cluster’s AI and computing power to better understand and fight COVID-19.
“NVIDIA DGX A100 is the ultimate instrument for advancing AI,” said Jensen Huang, founder and CEO of NVIDIA. “NVIDIA DGX is the first AI system built for the end-to-end machine learning workflow – from data analytics to training to inference. And with the giant performance leap of the new DGX, machine learning engineers can stay ahead of the exponentially growing size of AI models and data.”
A key feature is the ability to partition the DGX A100 into as many as 56 instances per system, allowing multiple workloads to run in parallel on the system – including AI training and inference workloads that previously used dedicated hardware. Combining these capabilities enables enterprises to optimize computing power and applications on a single, fully integrated, software-defined platform.
NVIDIA DGX A100 systems start at $199,000. That’s not cheap, but the company argues that these powerful systems offer better economics when it comes to the bang for the buck, and the value of AI.
The DGX A100 system takes up 6 rack units, allowing users to put up to 5 in a single data center rack. Huang says that $1 million rack can do the work of a typical data center with 600 CPU systems costing over $11 million.
Each DGX A100 unit can require as much as 6.5 kilowatts (kW) of power, meaning that five-unit rack would need to support between 28 kW and 32.5 kW of power density.
The Data Center Implications
These eye-popping specs and capabilities have implications for the data center, especially since AI-driven higher rack densities can prompt a shift to liquid cooling. Not every data center can support these densities, especially on-premises facilities which were likely designed and built in an era of much lower rack power densities.
“We routinely hear from executives who want to get away from physical IT infrastructure management and the associated capital expenditures, but they have concerns about the ability to securely leverage next-gen technologies,” said Avner Papouchado, CEO of ServerFarm.
Papouchado said that as part of NVIDIA’s DGX Ready program, “we’re staying ahead of the curve and investing in the technologies needed to manage today and tomorrow’s workloads.”
The DGX-Ready Data Center partners program offers colocation services in more than 122 locations across 26 countries for customers seeking facilities to host their DGX A100 infrastructure in “validated, world-class data center facilities,” the company says.
“Our customers use AI to drive their business transformation, but don’t always have facilities designed to meet the unique demands of AI workloads in their data center,” said Tony Paikeday, director of product marketing for DGX systems at NVIDIA.
Data center participants include many of largest service providers, as well as operators that specialize in supporting high-density workloads.
Here’s a look at the U.S. DGX-Ready roster:
Digital Realty provides an example of how data center operators are leveraging their partnerships with NVIDIA. Digital Realty created a DataHub design optimized for AI hardware and machine learning workloads as part of its recent PlatformDIGITAL rollout. The Data Hub footprint is based on typical customer deployment scenarios on NVIDIA DGX configurations, and designed to integrate seamlessly with enterprise storage solutions from class-leading providers.
“Customers today need AI infrastructure that can break through data barriers and power the digital enterprise,” said Chris Sharp, CTO, Digital Realty. “As an NVIDIA DGX-Ready Data Center Program partner, Digital Realty provides customers with globally focused, unified data center services, coupled with extensive design and construction expertise, giving them the launch pad needed to achieve their AI and digital transformation goals.”
CyrusOne is also seeking to optimize the process of deploying DGX A100 gear in their data centers, saying it “finds itself sitting at the epicenter of this data gravity.”
“As AI systems continue to grow in complexity, size and power, a new generation of power-efficient computing hardware and underlying data center cooling capabilities are required to handle this level of density and scale,” said John Gould, Chief Commercial Officer at CyrusOne. “As we move into this new decade, data centers will continue to be an essential infrastructure to enable AI, machine learning, and help companies unleash the full potential of this technology.”
The Next Chapter in NVIDIA’s Journey
NVIDIA’s GPU technology has been one of the biggest beneficiaries of the rise of specialized computing, gaining traction with workloads in supercomputing, artificial intelligence (AI) and connected cars. NVIDIA has been investing heavily in innovation in AI, which it sees as a pervasive technology trend that will bring its GPU technology into every area of the economy and society.
NVIDIA was founded in 1993, and its graphics processing units (GPUs) quickly became an essential tool for gamers yearning for more horsepower. The company’s GPUs worked with CPUs, but took a slightly different approach to processing data. A CPU consists of a few cores optimized for sequential serial processing, while a GPU has a parallel architecture consisting of hundreds or even thousands of smaller cores designed for handling multiple tasks simultaneously.
Underscoring the growing importance of data center networking, NVIDIA recently acquired networking specialist Mellanox in a $6.9 billion deal.
NVIDIA today announced new offerings for its EGX Edge AI platform, designed to offer distributed computing horsepower for hospitals, stores, farms and factories that must manage growing amounts of data streaming from edge sensors. The platform makes it possible to securely deploy, manage and update fleets of servers remotely. New products include the EGX A100 for larger commercial off-the-shelf servers and the tiny EGX Jetson Xavier NX for micro-edge servers.
The flagship early customer for the DGX A100 line is the Argonne National Laboratory with a timely project illustrating the value of accelerating AI workloads.
“We’re using America’s most powerful supercomputers in the fight against COVID-19, running AI models and simulations on the latest technology available, like the NVIDIA DGX A100,” said Rick Stevens, associate laboratory director for Computing, Environment and Life Sciences at Argonne. “The compute power of the new DGX A100 systems coming to Argonne will help researchers explore treatments and vaccines and study the spread of the virus, enabling scientists to do years’ worth of AI-accelerated work in months or days.”