Executive Roundtable: Evolution of Data Center SLAs in the Age of AI

Sept. 18, 2024
Four seasoned data center industry leaders expound on how they see service level agreements (SLAs) evolving for data center equipment and expansion projects in the age of rapidly escalating AI, HPC and cloud computing demand.

For the second installment of our Executive Roundtable for the Third Quarter of 2024, we asked our four seasoned data center industry leaders how they see service level agreements (SLAs) evolving for data center equipment and expansion projects in the age of rapidly escalating AI, HPC and cloud computing demand.

When implementing data center colocation services, traditionally one of the most important elements to analyze is the SLA a provider offers. This all-encompassing document defines and outlines the level of support and services customers can expect, ensuring a concrete agreement between the provider and the client. 

However, SLAs in the age of AI and cloud ubiquity stand to become more complex, and must be examined closely to guarantee a firm understanding of what is being offered and what the customer will see as a part of services.

 

 

Our Executive Roundtable for the Third Quarter of 2024 brings together data center industry leaders including:

Danielle Rossi, Global Director – Mission Critical Cooling, Trane
Sean Farney, VP of Data Center Strategy for the Americas, JLL
Harry Handlin, U.S. Data Center Segment Leader, ABB 
Josh Claman, CEO, Accelsius

Let's now look into the second question of the series for our Executive Roundtable for the Third Quarter of 2024.

Data Center Frontier:  How do you see service level agreements (SLAs) evolving for data center equipment and expansion projects in the age of rapidly escalating AI, HPC and cloud computing demand?

 

Danielle Rossi, Trane:  With the increase of AI, we are seeing drastic rises in density. 

Supporting high densities during outages and maintenance requires extra redundancy and planning. 

Going forward, we will see more stringent plans for servicing, including connected and virtual services, as well as long-term vendor-provided scheduled maintenance. 

Customers have already begun specifying longer-range service plans and more detailed spare parts kits. 

Along with these requirements, requests for vendor service capabilities are increasing. 

Global technician coverage and area-specific availability have always been a consideration but are increasing in priority. 

 

Sean Farney, JLL:  I've managed to SLAs across all levels of infrastructure for products ranging from low-latency trading to natural gas to Happy Meal toys (seriously!), and it's quite operationally complex.  

On top of the complexity, appetite for technical downtime is plunging as connectedness becomes more pervasive and we trust digital infrastructure to deliver more mission-critical services related to health, safety, security, finance, autonomous vehicles and more.  

I think service level expectations and requirements will continue to creep upward. 

To satisfy this need and reduce the brand risk and costs of violation, I see a surge in reliability engineering. 

This means taking a systemic approach to designing, building and operating technical infrastructure to higher uptime levels instead of cobbling together services for individual mechanical, electrical and plumbing (MEP) and information technology (IT) components. 

JLL's Reliability Engineering practice is inundated with requests to solve these big, impactful problems.  

The long-term, holistic approach leads to better performance, lower cost and smoother operations.  

 

Harry Handlin, ABB:  Innovation will be the key driver in the evolution of service level agreements for infrastructure equipment.  

AI presents many challenges for service organizations.  

First, the rapid growth of the AI market coupled with the increased scale of data centers has created a shortage of qualified service engineers and technicians.  

In addition, AI data center locations are not constrained by latency requirements. This has results in many data centers being in built in areas that are unlikely to be supported by a four-hour response time.  

For some remote sites, the location is more than four hours of travel from the closet field service location.

 

Josh Claman, Accelsius:  With AI and large learning models, we frequently hear that it doesn’t really matter if there’s downtime. 

No one talks about five-nines or other traditional IT metrics of availability when it comes to LLMs. 

The problem, however, is that the assets you’ve deployed can’t sit idle–because, once idle, your ROI for those assets is now muted.

Sure, for inference engines, high-frequency trading and HPCs, we should aim for that 99.999% availability. 

But for large learning models, the math needs to change. 

You need to ensure those assets are fully utilized, which is why a lot of our clients move to liquid cooling. 

They want to avoid the throttling events that occur with traditional air cooling. 

 

Futuristic AI data center rendering. Does this image envision some future prospect where AI travels holographically for SLA via the visible light spectrum? Probably not. And yet, gazing into the GPUs' illustrated self-conception, such questions arise.

 

Next:  Achieving Balance in AI Data Center Designs 

 

 

Keep pace with the fast-moving world of data centers and cloud computing by connecting with Data Center Frontier on LinkedIn, following us on X/Twitter and Facebook, and signing up for our weekly newsletters using the form below.

About the Author

Matt Vincent

A B2B technology journalist and editor with more than two decades of experience, Matt Vincent is Editor in Chief of Data Center Frontier.

Sponsored Recommendations

Optimizing AI Infrastructure: The Critical Role of Liquid Cooling

In this executive brief, we discuss the growing need for liquid cooling in data centers due to the increasing power demands of AI and high-performance computing. Discover how ...

AI-Driven Data Centers: Revolutionizing Decarbonization Strategies

AI hype has put data centers in the spotlight, sparking concerns over energy use—but they’re also key to a greener future. With renewable power and cutting-edge cooling, data ...

Bending the Energy Curve: Decoupling Digitalization Trends from Data Center Energy Growth

After a decade of stability, data center energy consumption is now set to surge—but can we change the trajectory? Discover how small efficiency gains could cut energy growth by...

AI Reference Designs to Enable Adoption: A Collaboration Between Schneider Electric and NVIDIA

Traditional data center power, cooling, and racks aren’t sufficient for GPU-based servers arranged in high-density AI clusters...

Getty Images, Courtesy of JLL
Source: Getty Images, Courtesy of JLL

Maximizing Data Center Performance: The Critical Stages of Commissioning

Brandon Reed, Senior Vice President – Head of Commissioning North America at JLL, explores the critical stages of data center commissioning and their impact on facility reliability...

White Papers

Get the full report

Enhancing Resiliency For the Energy Transition

Nov. 14, 2021
This white paper from Enchanted Rock explores how dual purpose microgrids can offer resiliency and stability to the grid at large.