PHOENIX – Delivering uptime has always been the prime directive for data centers. The industry was created to ensure that mission-critical applications never go offline. That goal has typically been achieved through layers of redundant electrical infrastructure, including uninterruptible power supply (UPS) systems and emergency backup generators.
But cloud computing is bringing change to how companies approach uptime, introducing architectures that create resiliency using software and network connectivity. This strategy, pioneered by cloud providers, is creating new ways of designing applications.
“Software-defined everything allows you less redundancy,” said Carrie Goetz, Global Director of Technology for Paige DataCom Solutions, who spoke on the topic at the recent Data Center World 2019 conference in Phoenix. “As data centers become more diverse, we need to address redundancy. If we have applications that fail over properly, we don’t need all that redundancy in the facility itself.”
That’s a shift in thinking for an industry long accompanied to equating more infrastructure with more reliability. That’s the general thrust of the Tier system developed by The Uptime Institute, which has long served as the standard for data center reliability and focuses on topologies of redundant power and cooling infrastructure. In recent years Uptime has expanded its focus to include the growing role of operations in data center uptime.
Now even Uptime acknowledges the growing importance of software, and how it can be a game changer for data center design.
“Software enabled application resiliency is now playing a significant and increasing role in bolstering applications availability and reliability across the enterprise, reducing risk to the business,” writes Todd Traver, VP for IT Optimization and Strategy at The Uptime Institute, in a 2018 blog post. “No longer are clients solely reliant upon the stability provided by the electrical and mechanical systems in their data center. By utilizing new software techniques, enterprises are now able to deploy applications that span multiple instances from enterprise to co-location to cloud, that bolster the availability and reliability of their critical applications.”
The Urge to Add Infrastructure Persists
Even as Uptime acknowledges the new approaches to resiliency, there are signs that reduced redundancy will be a tough sell in many corners of the data center industry. Those tensions were on display at Data Center World 2019 conference in Phoenix.
The keynote featured findings from the annual State of the Data Center survey from AFCOM, which noted that enterprise IT is becoming more cloud-like, with growing adoption of Linux containers and orchestration. “In the next 12 months, cloud will be the dominant model,” said Bill Kleyman, Executive Vice President of Digital Solutions at Switch, who summarized the key themes emerging from the survey. “There’s now a better level of maturity and understanding what cloud really is about.”
In theory, that trend should be accompanied by new thinking on application resiliency. Instead, some segments of the AFCOM membership appear to be trending in the opposite direction, and are contemplating additional redundancy.
Kleyman noted that the largest group of respondents (47 percent) are currently using an N+1 power configuration, as would be expected. But the AFCOM members using a more redundant N+2 configuration is expected to rise from the current 21 percent to 30 percent over the next three years.
A similar trend shows up in cooling, where N+1 (44 percent) is the predominant approach, but use of an N+2 design appears poised to rise from 24 percent today to 27 percent in three years.
How Much is Too Much?
A key issue is data center culture, according to Goetz, who led a DCW session on the topic following the State of the Data Center findings.
“We’re our own worst enemy,” said Goetz. “No one wants to be responsible for something not being redundant, so we make everything redundant. Power, network, storage. As we fail over and fail over and fail over, we see all this waste.
“If you start adding up the waste, from a capital expense and maintenance perspective, there’s a good chance that half of that (duplication) doesn’t need to be redundant anyway,” she added. “It’s a cycle that never stops.”
As applications shift to software-enabled resiliency, it will create many opportunities to slash the cost of data center infrastructure, said Goetz. She is a firm believer in the future of the enterprise data center, but not always in its current form.
“We have to stop thinking that one size fits all,” said Goetz. “As data centers become more diverse, we need to address redundancy. If we have applications that fail over properly, we don’t need all that redundancy in the facility itself.”
Failover Strategies Create Opportunity
At DCF, we’ve written about the growing trend to manage resiliency through the network, as well as multi-tenant data centers’ increased use of variable resiliency – housing some workloads with no generator or UPS support – as a means of reducing the need for redundant infrastructure (and the accompanying expense). Since 2016, we’ve tracked how versions of this trend have been implemented at providers like Verne Global, Vantage Data Centers, Digital Realty (DFT), Sentinel Data Centers and CyrusOne.
Much of recent thinking about resiliency has been influenced by the use of availability zones (AZs) by cloud platforms, especially Amazon Web Services. AZs are clusters of data centers within a region that allow customers to run instances of an application in several isolated locations to avoid a single point of failure. If customers distribute instances and data across multiple AZs and one instance fails, the application can be designed so that an instance in another availability zone can handle requests.
The rise of availability zones has influenced how cloud-centric companies design and build applications, as well as how wholesale data center providers develop properties for their hyperscale tenants. Companies like Cyxtera have discussed ways to apply these concepts in colocation environments.
Goetz agrees that the trend toward variable resiliency create opportunities for service providers.
“I think what you’re going to see is a lot more Tier II colos at lower rent prices, because not everyone really needs Tier III,” she said. “You can shift applications to sites based on the value of downtime, matching lower resiliency sites to apps that can withstand downtime.” This could apply to research labs, scientific supercomputing, and cryptocurrency mining.
Goetz also noted that new accounting rules are influencing colocation lease agreements, and may place pressure on spending – including the cost of redundancy. “CFOs are having heart attacks because you have to disclose the full amount of the lease, regardless of how long it is or whether you use all of it,” said Goetz. Since lower resiliency colocation space typically costs less, the accounting standards could prompt additional discussion of application resiliency and the cost of overprovisioning.
Start With the Workload
So who needs less infrastructure, and how do they make these decisions? It’s all about the workload.
“We have to start with the application and work backwards,” said Goetz. “Downtime is expensive. I need to know what 15 minutes of downtime looks like in your world. We need to look at what the application needs, and then decide on redundancy. There’s a lot of economies to be had here.”
Part of the challenge is that data center professionals specialize in risk. They apply critical thinking to imagine virtually every way a device or application can fail, and engineer ways to address the risk.
But thinking about risk and redundancy doesn’t apply only to facilities. In a world where uptime is expensive to companies, there is always the temptation to calculate the cost of downtime for careers.
Goetz says this is a complex problem, and needs to be addressed on multiple fronts. These include:
- Executive Support – Data center teams adopting new architectures need to know the C-suite has their back. “This is 100 percent a top-down decision,” said Goetz. “If not, these decisions get harder to make. Fear of failure and fear of job security have to go away. If we have the COO’s buy-in to make these decisions, we can bring great value.”
- Vendors – Partners who sell equipment are rarely going to advise you to buy less of it. “We have to be at the point where we can challenge our vendors, and have the confidence to do things differently,” said Goetz, who advocates for the value of a “disinterested third party” who can consult without the incentive of finder’s fees, referrals or commissions.
- Internal Business Siloes – The communications disconnect between facilities and IT teams is a long-standing challenge for the data center industry. “These siloes have got to go,” said Goetz. “Facilities and IT have got to get together. Thankfully, we don’t have these meetings anymore where I have to introduce them to one another.”
As Uptime notes, there are many variables to consider.
“There’s a big new world of options and approaches when it comes to applications resiliency design, with most enterprises still using a belt and suspenders approach of software and hardware to reduce risk and ensure resiliency and reliability,” Traver writes. “But with new cloud services providing increasingly more self-service capabilities, it’s becoming critically important for customers to clearly evaluate their modern digital business requirements which can then be used to map out a strategy that provides the highest level of availability and resiliency at a cost which is aligned with the business itself.”