Data Center Evolution and the Road to Liquid Cooling

Last week we launched a special report series on how liquid cooling is allowing the data center industry to leverage density and space more effectively while still being able to scale critical resources. This week, we’ll take a look at how liquid cooling adoption has evolved over time.

Download the full report.

Even though technology moves at a staggering pace, it’s critical to understand how the components of data center efficiency have evolved. To that extent, let’s focus on data center cooling and airflow management. Believe it or not, it’s only until recently that real airflow best practices have become established. Data center airflow management first became a conversation point in the mid-1990s when the futility of the then-common method of organizing computer rooms with all racks facing front. Data center engineers implemented the first “intentional” hot and cold aisle organization of server racks. Having established the value of separating cold aisles from hot aisles, a science of best practices quickly emerged to optimize the benefits of that separation.

In 2005, Oracle and Intel reported on case study projects in which they had deployed server cabinets with vertical exhaust ducts (or chimneys) bridging from the cabinets to a suspended ceiling return air path, thereby wholly separating the return air from the rest of the data center. While both studies reported on the effectiveness of the cooling and the opportunity for higher rack density, what was most noteworthy was that they both cited measured evidence of reduced cooling energy costs.

Shortly after that, Lawrence Berkeley National Labs reported on a June 2006 study at the National Energy Research Scientific Computing Center in Oakland, California. This study included a cold aisle containment experiment resulting in measurable savings on cooling unit fan energy, chiller plant energy at a higher set point, and increased economizer hours. From this point on, the conversation on data center airflow management changed from a primary focus on effectiveness to a focus on efficiency.

Then, in 2010, ASHRAE 90.1, Energy Standard for Buildings except for Low Rise Residential, eliminated the process exemption for data centers and added prescriptions for economization, variable flow on fans, and restrictions on humidity management as a reflection of evolving best practices for data center airflow management. As data center airflow management reached mainstream status in the past few years, the evolution of this field has focused on fine-tuning all the developments of the preceding decade.

What’s changed? During the ‘90s and mid-2000s, designers and operators worried about the ability of air-cooling technologies to cool increasingly power-hungry servers. With design densities approaching or exceeding 5 kilowatts (kW) per cabinet, some believed that operators would have to resort to rear-door heat exchangers and other in-row cooling mechanisms to keep up with the increasing densities.

Still, for decades, computer rooms and data centers utilized raised floor systems to deliver cold air to servers. Cold air from a computer room air conditioner (CRAC) or computer room air handler (CRAH) pressurized the space below the raised floor. Perforated tiles provided a means for the cold air to leave the plenum and enter the main space —ideally in front of server intakes. After passing through the server, the heated air returned to the CRAC/CRAH cooled, usually after mixing with the cold air.

New types of workloads simply require a new way to cool the servers on which they operate.

This system was the most common design for many years, data centers, and server cooling design. It is still employed today. But how effective is it for next-generation workloads and server designs? Can these systems support things like HPC and supercomputing?

The Need for Better Data Center Cooling and Server Design

When it comes to server cooling—the concept is straightforward. Heat must be removed from the server and IT equipment’s electrical components to avoid overheating the components. Simply put, if a server gets too hot, onboard logic will turn it off to prevent damage to the server.

But it’s not just heat you worry about. Some big data and analytics servers are highly sensitive and can be impacted by particle contamination. Still, in addition to the threats posed by physical particulate contamination, there are threats related to gaseous contamination. Certain gases can be corrosive to electronic components.
These types of traditional cooling systems will certainly still have their place in the data center. However, new types of workloads simply require a new way to cool the servers on which they operate.

According to the special report, new types of server cooling and data center management systems change how we bring efficiency into our data centers. Specifically, liquid cooling has introduced new levels of efficiency, capabilities around scale, and optimization around server workload delivery. Furthermore, you help remove the dangers around particulate or even gaseous contaminants when working with sensitive server equipment.

Let’s examine how next-generation liquid cooling impacts new and emerging use-cases with all of this in mind.

New use-cases that are increasing adoption of liquid cooling
In the past, liquid cooling was seen as a puzzle piece that often added complexity to the data center. With new design considerations and data center architectures, liquid cooling has taken on an entirely new form, making the solution far more consumable than ever before. We’ll discuss modern liquid cooling architecture in the next section. But it’s important to note that leaders are now working with liquid cooling systems designed to provide customers and solution providers with complete, holistic turnkey packages. These designs consist of purpose-built liquid cooling platforms, components, and software. Administrators are leveraging a liquid cooling plug-and-play architecture that seamlessly fits into modern data center architectures.

Here’s where these designs are being used:

Machine Learning & Artificial Intelligence. As intelligent computers get smarter, the demands for high-density, purpose-built computing environments explicitly designed to meet these systems’ density and power needs continue to increase. New integrated and converged liquid cooling solutions provide heretofore unimaginable equipment densities, enabling more powerful clusters than what can be achieved using conventional cooling and interconnection technologies.
Edge & Smart City. The IoE (Internet of Everything) evolution continues to accelerate. These platforms demand the highest levels of flexibility in terms of deployment locations and overall architecture. Integrated liquid cooling solutions enable these platforms to provide the highest service levels, reliability, and value to their populations.
Oil & Gas. Exploring the depths of exploitable resources is a highly demanding challenge. In a world where every second matters, the new liquid cooling platforms allow organizations to deploy the compute resources they require in the locations where they are most useful, maximizing the organization’s profitability.
VDI, HPC, Application Delivery. With cloud computing and the repatriation of workloads, the lines between VDI, HPC, and new application delivery requirements are becoming blurred. Further, the events of 2020 put in motion unique needs around virtual desktops, compute environments, and application delivery. Fully integrated liquid cooling platforms allow organizations to rethink how they deploy these critical enterprise resources to provide the maximum return on their investment and the highest levels of end-user experience. That said, it’s important to note what “integrated liquid cooling” really means. New solutions will come as fully packaged designs integrated the entire IT stack while still supporting liquid cooling. You’ll see the network, storage, compute, and power working together with liquid cooling solutions in this design. Further, these integrated designs consider performance and come with Nvidia GPUs embedded. So, beyond simple desktops, technology leaders leverage an all-in-one high-speed, high-capacity system with GPU, connectivity, storage, and compute all built-in.
Financial Services. When success depends on microseconds, there is nothing more important than ensuring that your systems can deliver a strong competitive advantage. The densities, simplicity, and capabilities of new liquid cooling platforms allow financial service organizations to rethink how they deploy their most critical infrastructure.
Research & Education. Developing the next generation of cures, materials, and processes requires the next generation of high-performance computing infrastructure. Integrated liquid cooling solutions enable organizations to solve problems and develop solutions faster, more efficiently, and less complex.
CAD, Modeling & Rendering. Turning raw video into a beautiful user experience can be a complex and costly process. It can be challenging to ensure that the source material will match the final display media without herculean efforts. By employing integrated liquid cooling architecture, organizations can render faster and in places never before possible. Another critical point revolves around the level of integration between liquid cooling and server infrastructure. As an all-in-one system, the new liquid cooling architecture can come equipped with blade designs capable of supporting 16 NVidia V100 GPUs with 32GB of RAM. These GPUs are interconnected in two eight-node NVLink clusters for rapid GPU to GPU communications. Remember, these are not standalone liquid cooling solutions that have to integrate with storage and compute systems. Instead, they come purpose-built with key components already designed to work with liquid cooling. Furthermore, they’re far more easily integrated as pods into modern data center infrastructure.
Gaming. E-sports, e-gaming, and e-entertainment are some of the fastest-growing industries in the world today. As with any new industry, they find technology solutions to meet never-before-seen challenges can be quite the undertaking. Integrated liquid cooling solutions have been designed from the ground up to meet the needs of industries that will benefit from the next generation of purpose-built, high-density computing solutions.

Data Center Design Considerations

Air cooling has been the preferred method in data centers. And for the most part, air cooling is still a viable option. For the most part, given low electronic densities and affordable energy prices, blowing cold air across electronics generally works. However, data center and compute solutions have become more compact. Additionally, high equipment densities are more common, making the need for better cooling methods imperative. Looking at data center design and integration with new systems, traditional computer room air conditioning is no longer enough in some new use-cases. Add in the rising energy costs and support advanced use-cases (mentioned above) can become rather expensive.
When working with integrated liquid cooling solutions, it’s important to note that liquids are more conductive to heat. This means that even a room-temperature liquid can cool more effectively than cold air. Simply put, since specialized cooling liquids have between 50 and 1,000 times the capacity to remove heat than air, the option to move to liquid cooling to support emerging use-cases is there.

Because liquid can remove heat more efficiently than air, operating at lower temperatures at higher clock speeds will allow for greater performing systems.

Understanding the Challenges In Air Cooling Specific Workloads
Although there is nothing wrong with air-cooled solutions, it’s more important than ever to understand the workloads and use-cases leveraging a specific type of cooling. As it relates to air-cooled designs, a significant challenge in supporting emerging use-cases revolves around five key factors.

Density and Capacity. At the data center level, improvements to individual systems entail the implementation of accelerator processors. More specifically, graphics processing units (GPUs), application-specific integrated circuits (ASICs), and field-programmable gate arrays (FPGAS). Hardware accelerators are often used in machine learning, but industry trends suggest they’ll be used more often in IT services like analytics, data mining, engineering simulation, and even fraud detection. These new systems are highly dense. Depending on the design, traditional air-cooled solutions might simply not be able to handle the amount of density you need. This might require you to spread the workload and utilize more data center floor space.
Scale. Working with accelerators helps cut down on problems like rack density in data centers, which contribute to cooling needs. A GPU paired with an Intel processor can provide more output for less density than a strictly CPU setup for many workloads, which also helps avoid excess power consumption and heat. The challenge emerges when you try to scale these systems out. In a fully integrated system, liquid cooling is an effective method when using accelerators like GPUs. For example, integrated liquid cooling systems will allow you to add blades designed from the ground up to enable the next generation of GPU Accelerated Applications for virtually all use cases. These GPU-integrated systems come with cooling, racking, networking, and management, all bundled together.
Efficiency. The challenge with efficiency is saving energy and not losing cooling power; everyone wants to be as energy-efficient as possible. Today, there is a demand for more powerful chips resulting in new requirements around cooling and efficiency. During the evolution of liquid cooling, we saw direct-to-chip liquid cooling solutions. Today, high-density, modular two-phase immersion cooling solutions come with all of the compute, network, and storage components that you need. As mentioned earlier, liquids are far more efficient at removing heat than air cooling. For particular use-cases where efficiency is critical, purpose-built liquid cooling designs need to be considered.
Performance. Because liquid can remove heat more efficiently than air, operating at lower temperatures at higher clock speeds will allow for greater performing systems. Fully integrated liquid cooling solutions can be deployed with built-in blade and chassis systems delivering next-level performance. Again, you don’t have to add liquid cooling into your chassis. Instead, the system comes with 12 sockets, capable of supporting 288 cores of raw CPU power, 3 TB of memory, alongside support for both 10G Ethernet and 100G Ethernet well as 100G InfiniBand. All of this has liquid cooling built into the design.
Cost. A recent study has indicated that the data centers’ unprecedented rise in power consumption has increased operational and power costs. It has become a challenge for end-users to manage and conserve power in the data centers. With this in mind, the price can vary substantially depending on the features you’re prioritizing. Generally speaking, though, air cooling systems cost less due to their more straightforward operation; however, as rack density increases, the conversation around cost and cooling efficiency changes. In a recent study by Schneider Electric, we see that for like densities (10kW/rack), the data center cost of an air-cooled and liquid-cooled data center is roughly equal. However, liquid cooling also enables the IT compaction, and with compaction, there is an opportunity for CapEx savings. Compared to the traditional data center at 10 kW/rack, a 2x compaction (20 kW/rack) results in a first cost savings of 10%. When 4x compaction is assumed (40 kW/rack), savings go up to 14%.

(Source: Schneider Electric)

Download the full report, “The State of Data Center Cooling: A Key Point in Industry Evolution and Liquid Cooling” courtesy of TMGcore to learn how new data center and business requirements are shaping digital infrastructure. In our next article, we’ll explore new designs, standards, and liquid cooling systems. Catch up on the first article here.