Intel, NVIDIA Roll Out New HPC Hardware at SC19

Intel and NVIDIA continue to compete for mindshare and market share in the market for the world’s most powerful computers. Both companies rolled out new offerings around SC19 in Denver, the annual conference for the high-performance computing (HPC) community, which provides a showcase for cutting-edge hardware.

Here’s a look at the new offerings from Intel and NVIDIA.

Intel: New GPUs, Heterogenous Architecture

Intel continues to hold a dominant position in the enterprise computing space, but the development of powerful new hardware optimized for specific workloads has been a major trend in the HPC sector, boosted by demand for data-crunching for artificial intelligence and other types of specialized workloads. Beneficiaries have included NVIDIA and AMD, as well as ARM specialist Ampere and a host of startups developing low-power chips to allow smartphones to run AI workloads on the device.

At Supercomputing 2019, Intel unveiled a new category of general-purpose GPUs based on Intel’s Xe architecture. Code-named “Ponte Vecchio,” this new high-performance, general-purpose GPU is architected for HPC modeling and simulation workloads and AI training. Ponte Vecchio will be manufactured on Intel’s 7nm technology and will be Intel’s first Xe-based GPU optimized for HPC and AI workloads. Intel did not provide specifics on a date for commercial availability.

Intel is the leader in computing with CPUs and x86 servers, and also offers a line of FPGAs (Field Programmable Gate Arrays) to accelerate cloud and AI workloads. But in recent years, much of the hardware acceleration in the HPC sector has relied upon GPUs (graphics processing units) from NVIDIA. A CPU consists of a few cores optimized for sequential serial processing, while a GPU has a parallel architecture consisting of hundreds or even thousands of smaller cores designed for handling multiple tasks simultaneously.

With the recent emphasis on parallel processing, the addition of a powerful GPU product is strategically significant for Intel, providing it with an integrated CPU-GPU offering as an alternative to accelerating workloads with NVIDIA GPUs.

Intel bills Ponte Vecchio as its first “exascale graphics card,” but Toms Hardware says that level of compute will require multiple cards working together over a fast and flexible fabric. This new card will debut in the Aurora Supercomputer at Argonne National Laboratory in 2021, which will be the world’s first exascale-class supercomputer.

Intel also took steps to address the growing use of heterogeneous HPC architectures (i.e. customers combining gear from several vendors), launching the oneAPI industry initiative to deliver a unified programming model for application development across different processing architectures, including CPUs, GPUs, FPGAs and other accelerators. “The launch of oneAPI represents millions of Intel engineering hours in software development and marks a game-changing evolution from today’s limiting, proprietary programming approaches to an open standards-based model for cross-architecture developer engagement and innovation,” Intel said.

“HPC and AI workloads demand diverse architectures, ranging from CPUs, general-purpose GPUs and FPGAs, to more specialized deep-learning NNPs, which Intel demonstrated earlier this month,” said Raja Koduri, senior vice president, chief architect, and general manager of architecture, graphics and software at Intel. “Simplifying our customers’ ability to harness the power of diverse computing environments is paramount, and Intel is committed to taking a software-first approach that delivers a unified and scalable abstraction for heterogeneous architectures.”

Ars Technica notes that oneAPI is a set of libraries that tie hardware-agnostic API calls directly to heavily optimized, low-level code that drives the actual hardware available in the local environment.

NVIDIA: A Beefier Azure GPU Cloud, Faster IO

NVIDIA was also thinking about heterogenous architectures. But their focus at SC19 was how to get CPUs out of the way and allow GPUs to interact directly with other components.

NVIDIA introduced NVIDIA Magnum IO, a suite of software to dramatically accelerate data processing by eliminating storage and input/output bottlenecks. The Magnum IO suite is anchored by GPUDirect, which provides a path for data to bypass CPUs and travel on “open highways” offered by GPUs, storage and networking devices.

“Processing large amounts of collected or simulated data is at the heart of data-driven sciences like AI,” said Jensen Huang, founder and CEO of NVIDIA. “As the scale and velocity of data grow exponentially, processing it has become one of data centers’ great challenges and costs. Extreme compute needs extreme I/O. Magnum IO delivers this by bringing NVIDIA GPU acceleration, which has revolutionized computing, to I/O and storage. Now, AI researchers and data scientists can stop waiting on data and focus on doing their life’s work.”

NVIDIA said it developed Magnum IO in close collaboration with industry leaders in networking and storage, including DataDirect Networks, Excelero, IBM, Mellanox and WekaIO.

“Modern HPC and AI research relies upon an incredible amount of data, often more than a petabyte in scale, which requires a new level of technology leadership to best handle the challenge,” Sven Oehme, DDN chief research officer. “DDN, by taking advantage of NVIDIA’s Magnum IO suite of software along with our parallel EXA5-enabled storage architecture, is paving the way to a new direct data path which makes petabyte-scale data stores directly accessible to the GPU at high bandwidth, an approach that was not previously possible.”

NVIDIA Magnum IO software is available now, with the exception of GPUDirect Storage, which is currently available to select early-access customers. Broader release of GPUDirect Storage is planned for the first half of 2020.

NVIDIA’s new Magnum IO software suite can run on any NVIDIA-powered system, including the DGX SuperPOD pictured above. (Photo: NVIDIA)

NVIDIA also announced the availability of a new kind of GPU-accelerated supercomputer in the cloud that is available on Microsoft Azure. The companies said the new NDV2 Azure instance will bring advanced supercomputing power to a broader range of cloud customers, offering capabilities of large-scale, on-premises supercomputers that can take months to deploy.

“Until now, access to supercomputers for AI and high-performance computing has been reserved for the world’s largest businesses and organizations,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “Microsoft Azure’s new offering democratizes AI, giving wide access to an essential tool needed to solve some of the world’s biggest challenges.”

“As cloud computing gains momentum everywhere, customers are seeking more powerful services,” said Girish Bablani, corporate vice president of Azure Compute at Microsoft Corp. “Working with NVIDIA, Microsoft is giving customers instant access to a level of supercomputing power that was previously unimaginable, enabling a new era of innovation.”

Microsoft and NVIDIA engineers used 64 NDv2 instances on a pre-release version of the cluster to train BERT, a popular conversational AI model, in roughly three hours.

Other Server News from SC19

There were also hardware announcements from a number of other server vendors:

Dell Technologies

Dell Technologies is unveiling the Dell EMC PowerSwitch Z9332F-ON, a 400GbE open networking switch designed for high performance workloads. The switch is part of Dell’s focus on software-defined networking, making network operations more flexible, programmable and easier to manage. The Dell EMC PowerSwitch Z9332F-ON is purpose-built for cloud service provider data center networks with intensive compute and storage traffic, such as HPC, AI and streaming video.

The new switch also delivers four times the throughput, double the price performance and near double the power efficiency of existing 100GbE platforms. Dell also added NVIDIA T4 Tensor Core GPUs as a new accelerator option for the Dell EMC DSS 8440 server. With up to 16 accelerators, this configuration offers high capacity, high performance machine learning inference with strong energy efficiency (70 watts per GPU).

Lenovo

Lenovo has teamed with Intel to beef up a supercomputer for Harvard University’s Faculty of Arts and Sciences Research Computing unit (FASRC). The Cannon cluster is comprised of more than 30,000 2nd gen Intel Xeon Scalable processor cores, includes Lenovo’s Neptune liquid cooling technology, which lets critical server components operate at lower temperatures allowing for greater performance and energy savings. The Cannon storage system is spread across multiple locations, but the primary compute is housed in the Massachusetts Green High Performance Computing Center, a LEED Platinum-certified data center in Holyoke, MA.

“With the increased compute performance and faster processing of the Cannon cluster, our researchers now have the opportunity to try something in their data experiment, fail, and try again,” said Scott Yockel, director of research computing at Harvard University’s Faculty of Arts and Sciences. “Allowing failure to be an option makes our researchers more competitive.”