• About Us
  • Partnership Opportunities
  • Privacy Policy

Data Center Frontier

Charting the future of data centers and cloud computing.

  • Cloud
    • Hyperscale
  • Colo
    • Site Selection
    • Interconnection
  • Energy
    • Sustainability
  • Cooling
  • Technology
    • Internet of Things
    • AI & Machine Learning
    • Edge Computing
    • Virtual Reality
    • Autonomous Cars
    • 5G Wireless
    • Satellites
  • Design
    • Servers
    • Storage
    • Network
  • Voices
  • Podcast
  • White Papers
  • Resources
    • COVID-19
    • Events
    • Newsletter
    • Companies
    • Data Center 101
  • Jobs
You are here: Home / Voices of the Industry / The Bathtub Curve and Data Center Equipment Reliability

The Bathtub Curve and Data Center Equipment Reliability

By Voices of the Industry - May 11, 2020

The Bathtub Curve and Data Center Equipment Reliability

One of the top priorities for today's IT professionals is strengthening security and privacy. (Photo: Service Express)

LinkedinTwitterFacebookSubscribe
Mail

Jake Blough, Chief Technology Officer for Service Express, explores the Bathtub Curve theory, its limitations,  and data center equipment reliability and maintenance. 

Jake Blough, Chief Technology Officer, Service Express

When digging into reliability engineering theories, you will quickly find the widely used Bathtub Curve. According to this theory, when a product is new to the market, there are substantial rates of early failures – which commonly result from an error with handling or installation. As the end of product life approaches, the rate increases due to a second and final wave of wear-out failures. Although the Bathtub Curve, pictured below, accurately reflects the failure behavior of many products, we have found it does not universally apply to data center equipment. 

Examining Reliability Data

At Service Express, we’ve collected over 15 years of equipment data from over half a million devices. The data tracks when equipment breaks, how it breaks and how often it breaks. The common assumption is that these devices should have a higher failure rate in their infancy and then again toward the end of life. However, in looking at non-critical and critical server and storage failures, our data shows that equipment failure rates do not follow the Bathtub Curve as expected.

data center equipment

Critical Server Failures

 A critical failure occurs when something like a CPU or system board fails. Critical server failures result in the loss of access to applications or data – impacting business productivity. In the graph below, you will see that most machines exhibit a failure rate between 0% and 0.2% with an outlier having an early production issue of 0.3%. These rates stay almost identical over a 10-15-year life span.

data center equipment

Non-Critical Server Failures

A non-critical failure occurs when a component like a disk drive or power supply fails. Modern data center equipment has built-in redundancy for these components, so no loss of data or access occurs in these instances. In the graph below, you will see a data set tracking non-critical server failure of several models over 13 years.

data center equipment

You can see that the non-critical failures barely increase over time with a failure rate of less than 0.5%; this is consistent with the number of components installed in the system. The more components in a system, the more chances of part vulnerability. The increase in failures toward the end of life seen here is attributed to the number of components in the system versus the wear-out factor associated with the Bathtub Curve. Systems in a blade form factor show much lower non-critical instances compared to large 4U 4-CPU form factor systems.

Critical & Non-Critical Storage Failures 

Storage devices are comprised of three types of components including critical, non-critical and disk drives. Critical parts typically include storage processors, whereas non-critical parts include cache batteries, power supplies and fans.

Storage systems are built to be incredibly resilient and tolerant of multiple failures before data is impacted. We consider storage processors to be the most critical components as a loss of a service processor will affect overall performance. In the graph above you can see critical, non-critical and drive failures for a popular OEM storage system. Note that over five years, critical storage failures occur between 0.1% and 0.2% – resulting in about one failure out of 1,000 systems per month. Non-critical faults are typically caused by cache battery sets which must be replaced every 3-5 years.

data center equipment

Disk Drive Failures

The graph above represents data for all disk drive failures over six years. You can see that disk drives experience a failure rate between 0.2% to 0.3%. Meaning that over time, disk drives are far more resilient than “common knowledge” would have you believe.

The long-term equipment reliability as illustrated by the data is a source of good news for IT departments. This failure data counters the traditional recommendation for a hardware refresh based on the expectation of increased failures as equipment ages. You can factor in longer equipment reliability and cost-savings when considering the timing of your refresh.

Your Next Data Center Refresh

 Of course, there are valid reasons for taking on the cost and time of a hardware refresh. Primary factors that should determine when a hardware upgrade is needed include:

  • Software compatibility
  • Hardware compatibility between devices
  • Performance capacity has been exceeded

If your equipment is meeting your immediate needs, consider delaying your refresh instead. Delaying an unneeded refresh can help you reduce your CapEx spend and improve the value of your original investment.

When it’s a question of spending tens of thousands of dollars on a refresh, you should evaluate your needs and access the facts to make the right decision for your environment. Based on our reliability data that shows stable failure rates over time for server and storage equipment, we recommend a refresh every 7-10 years. Your refresh cycle should always be driven by compatibility, capacity and reliability.

Jake Blough is the Chief Technology Officer for Service Express. 

LinkedinTwitterFacebookSubscribe
Mail

Tagged With: data center equipment, data center maintenance, Reliability, Service Express

Newsletters

Stay informed: Get our weekly updates!

Are you a new reader? Follow Data Center Frontier on Twitter or Facebook.
voices@richmiller.biz'

About Voices of the Industry

Our Voice of the Industry feature showcases guest articles on thought leadership from sponsors of Data Center Frontier. For more information, see our Voices of the Industry description and guidelines..

Comments

  1. jk@mirceakademy.com'Dr j. knezevic says

    May 21, 2020 at 3:54 am

    In absence of anything else the practicing reliability and safety engineers in 1960s created a model of system reliability that requires the acceptance of the concept of an “alternative universes” where all the components, and consequently systems, possess a constant, time independent, failure rate, leading to the following expression of reliability: . This approach stems from neither science nor mathematics, but from a desperate necessity to make a reliability and safety predictions based on the only existent information, which is failure statistics. Regrettably, these practises have been “legitimised” by numerous industrial and military standards, which are created to demonstrate the contractual compliance of a legally binding acquisition processes, as it is the case in many industries, even today.

    In summary, reliability and safety engineers, knowingly or unknowingly, adopted existence of a parallel universe where well-known and physically observed natural phenomena like: corrosion, fatigue, creep, wear and similar mechanisms do not exist. They tried to rectify the situation by invention of a bath-tab curve, which has never been incorporated it in the quantitative modelling of reliability and safety measures.

  • Facebook
  • Instagram
  • LinkedIn
  • Pinterest
  • Twitter

Voices of the Industry

Overcoming Supply Chain Roadblocks: How to Avoid Disruptions in Your Data Center

Overcoming Supply Chain Roadblocks: How to Avoid Disruptions in Your Data Center The data center industry continues to experience significant global supply chain problems. Brett Williams of Service Express, explores the importance of leveraging the secondary hardware market to overcome supply chain roadblocks.

White Papers

Liquid cooling Systems

The State of Data Center Cooling

Data center cooling is rapidly evolving as the adoption of liquid cooling allows the industry to redefine density and support next-generation computing capabilities. As data volume and computing power increases, data centers need to optimize their cooling solutions to keep up with customer demands. Get the special report from Data Center Frontier, in partnership with TMGcore, that looks at the state of today’s data center cooling and how liquid cooling technologies impact data center design.

Get this PDF emailed to you.

We always respect your privacy and we never sell or rent our list to third parties. By downloading this White Paper you are agreeing to our terms of service. You can opt out at any time.

DCF Spotlight

Data center modules on display at the recent Edge Congress conference in Austin, Texas. (Photo: Rich Miller)

Edge Computing is Poised to Remake the Data Center Landscape

Data center leaders are investing in edge computing and edge solutions and actively looking at new ways to deploy edge capacity to support evolving business and user requirements.

An aerial view of major facilities in Data Center Alley in Ashburn, Virginia. (Image: Loudoun County)

Northern Virginia Data Center Market: The Focal Point for Cloud Growth

The Northern Virginia data center market is seeing a surge in supply and an even bigger surge in demand. Data Center Frontier explores trends, stats and future expectations for the No. 1 data center market in the country.

See More Spotlight Features

Newsletters

Get the Latest News from Data Center Frontier

Job Listings

RSS Job Openings | Pkaza Critical Facilities Recruiting

  • Critical Power Energy Manager - Data Center Development - Ashburn, VA
  • Site Development Manager - Data Center - Ashburn, VA
  • Data Center Facility Operations Director - Chicago, IL
  • Electrical Engineer - Senior - Dallas, TX
  • Mechanical Commissioning Engineer - Calgary, Alberta

See More Jobs

Data Center 101

Data Center 101: Mastering the Basics of the Data Center Industry

Data Center 101: Mastering the Basics of the Data Center Industry

Data Center Frontier, in partnership with Open Spectrum, brings our readers a series that provides an introductory guidebook to the ins and outs of the data center and colocation industry. Think power systems, cooling, solutions, data center contracts and more. The Data Center 101 Special Report series is directed to those new to the industry, or those of our readers who need to brush up on the basics.

  • Data Center Power
  • Data Center Cooling
  • Strategies for Data Center Location
  • Data Center Pricing Negotiating
  • Cloud Computing

See More Data center 101 Topics

About Us

Charting the future of data centers and cloud computing. We write about what’s next for the Internet, and the innovations that will take us there. We tell the story of the digital economy through the data center facilities that power cloud computing and the people who build them. Read more ...
  • Facebook
  • LinkedIn
  • Pinterest
  • Twitter

About Our Founder

Data Center Frontier is edited by Rich Miller, the data center industry’s most experienced journalist. For more than 20 years, Rich has profiled the key role played by data centers in the Internet revolution. Meet the DCF team.

TOPICS

  • 5G Wireless
  • Cloud
  • Colo
  • Connected Cars
  • Cooling
  • Cornerstone
  • Coronavirus
  • Design
  • Edge Computing
  • Energy
  • Executive Roundtable
  • Featured
  • Finance
  • Hyperscale
  • Interconnection
  • Internet of Things
  • Machine Learning
  • Network
  • Podcast
  • Servers
  • Site Selection
  • Social Business
  • Special Reports
  • Storage
  • Sustainability
  • Videos
  • Virtual Reality
  • Voices of the Industry
  • Webinar
  • White Paper

Copyright Data Center Frontier LLC © 2022