How Data Centers Are Harnessing AI Workloads for Enhanced Cloud, LLM, and Inference Capabilities
If nothing else, October 2024 was a month where AI-dominated industry announcements had a huge impact on shaping the future of the data center. From infrastructure to storage designs, the common denominator is the focus on AI and how such services can be delivered to the customer.
Building for AI has become the de facto standard for the latest and greatest releases across the industry, and supporting this AI expansion will leave its imprint on the data center industry for the foreseeable future.
To wit: With the rapidly evolving artificial intelligence demands, both for supporting AI and enabling AI delivery, cloud vendors and data infrastructure providers are ramping up capabilities to meet demands for AI training workloads as well as oncoming inference performance.
While not a comprehensive listing, here are some recent announcements from Oracle, Nvidia, Cerebras, DigitalOcean, and Lightbits Labs, each companies who bring unique solutions to the table, creating flexible and scalable infrastructures for diverse AI applications.
Standardizing AI Infrastructure
To address the challenges of deploying AI clusters at scale, the Open Compute Project (OCP) has launched its Open Systems for AI initiative. This initiative fosters a collaborative, multi-vendor ecosystem aimed at developing standardized AI data center infrastructure.
Nvidia’s and Meta’s contributions to this project, such as Nvidia’s MGX-based GB200-NVL72 platform and Meta’s Catalina AI Rack architecture, are crucial for advancing common standards for AI computing clusters, reducing costs and operational silos for data centers.
Equipment vendors such as Vertiv are also announcing their own, dedicated support, for AI-intensive data center configurations. And Nvidia has announced their own reference architectures for enterprise class AI deployments.
These collaborations aim to tackle key obstacles, including power density, cooling, and specialized computation hardware, with liquid-cooled rack and compute trays that support efficient, high-density operations.
By creating a multi-vendor, interoperable supply chain, OCP facilitates quicker adoption and a lower barrier to entry for organizations seeking to deploy AI infrastructure. Reference architectures, from the OCP and others, make these deployments more achievable in shorter time frames.
Scaling AI with Zettascale Superclusters
Oracle’s launch of its Oracle Cloud Infrastructure (OCI) Supercluster, in collaboration with Nvidia, represents a leap in scale and performance.
The new OCI Zettascale Cluster supports up to 131,072 Blackwell GPUs, reaching 2.4 zettaFLOPS of peak performance.
OCI’s Superclusters are tailored to offer high-performance computing capabilities, including supporting workloads that demand extensive computational power such as large language model (LLM) training and data-intensive simulations.
Key to OCI’s offering is flexibility in deployment, enabling customers to use AI infrastructure in their preferred locations while complying with data sovereignty requirements.
For instance, WideLabs in Brazil leverages OCI’s high-performance infrastructure to develop a Portuguese LLM, using OCI's Nvidia H100 GPUs and Kubernetes Engine for scalable, secure workloads within Brazil.
This capability is especially beneficial in regions with stringent data sovereignty requirements where data residency and security are prioritized.
By building a worldwide infrastructure for the services available from OCI, Oracle enhances capabilities that require stringent adherence to local law and regulations.
Other notable uses of the service include Zoom using OCI’s generative AI inference capabilities to enhance its Zoom AI Companion, providing users with real-time drafting, summarizing, and brainstorming assistance.
Breaking Speed Barriers in AI Inference
With a focus specifically on AI inference, Cerebras Systems has set a new standard by delivering 2,100 tokens per second on the Llama 3.2 70B model, a performance 16 times faster than current GPU-based solutions.
Leveraging its proprietary Wafer Scale Engine 3 (WSE-3), Cerebras Inference provides massive memory bandwidth, allowing it to handle large models without the latency challenges seen in other systems.
This capability is instrumental in real-time applications, where speed and responsiveness are critical.
The speed advantage that Cerebras offers has drawn clients such as GlaxoSmithKline (GSK), which is exploring AI-driven research agents to enhance drug discovery.
In fields such as voice AI, companies like LiveKit have benefited from the accelerated pipeline, enabling seamless speech-to-text and text-to-speech processes that operate faster than typical GPU-based inference alone.
Simplifying AI Deployments
Addressing the complexities in getting AI/ML workloads setup and configured for specific use cases, DigitalOcean, in collaboration with collaborative AI community Hugging Face, has introduced 1-Click Models, a tool that simplifies the deployment of AI models such as Llama 3 and Mistral on DigitalOcean GPU Droplets.
This new feature aims to streamline the otherwise complex process of setting up AI and ML models in the cloud, allowing developers to quickly deploy inference endpoints with minimal setup.
By eliminating the need for intricate configuration and security setups, DigitalOcean’s 1-Click Models democratize access to powerful AI models, with the goal of making them accessible to a broader audience.
Integrated with Hugging Face’s GenAI Services (HUGS), Digital Ocean 1-Click models provide continuous updates and optimizations, ensuring users have access to the latest improvements in AI model performance.
Climate-Aligned AI Cloud Solutions
Proving that the demands of AI infrastructures go well beyond AI/ML hardware performance, Lightbits Labs, a pioneer in NVMe over TCP storage, delivering software defined storage, has partnered with Crusoe Energy Systems, self-described as “The World’s Favorite AI-First Cloud” to expand high-performance, climate-conscious AI infrastructure.
Crusoe’s data centers are powered by a combination of stranded and clean energy sources, reducing the environmental impact of AI workloads.
Lightbits’ software-defined storage delivers high performance with low latency, ideal for AI workloads that demand consistent and high-speed access to data.
Crusoe’s expanded use of Lightbits storage meets the needs of AI developers by providing a flexible, scalable infrastructure that ensures high availability and durability.
The partnership allows Crusoe to offer its AI cloud users an optimized environment that includes storage that scales to meet demand, especially for applications such as LLM training and generative AI.
Each of these solutions contributes to a more robust, accessible AI ecosystem, addressing the challenges of scale, efficiency, and usability.
Such innovations are paving the way for future advancements by building an infrastructure that will encourage widespread adoption of AI technologies, across a variety of business sectors.