Reliability or uptime stems from a combination of many factors. Some of the largest factors contributing to high-availability and uptime are specific to people, process, operations, maintenance, lifecycle, and risk mitigation strategies that a data center provider has in place. The list below details things to consider when evaluating any data center provider to determine how prepared they are to provide maximum reliability.
1) Process Controls
Ensure your data center provider has developed documented operational process controls for the entire organization and operations of the data center. Ask to see the data center’s procedure library before making a decision. If they have the proper procedures in place, they will allow you access to the library. Documented, validated, and repeatable processes create a standardized approach to operations, service delivery, and maintenance while mitigating or eliminating risks associated with human error.
2) Training/Qualification Programs
Ask your data center provider what training and qualification programs they have in place to ensure the highest levels of competency and qualified staff possible. Training to achieve excellence and a consistent level of knowledge throughout the organization and data center operations is crucial.
3) Infrastructure Management Standards
Make sure your data center provider has developed and implemented infrastructure identification and management standards. The use of infrastructure standards for the uniform identification of equipment throughout the electrical distribution, mechanical, connectivity, access, and fire/life safety systems is fundamental to an operational environment that has strict process control. There is no such thing as too much labeling or signage in a data center.
4) Frequent Data Center Inspections
Does your data center provider require frequent inspections of the entire data center? If so, how often? A disciplined approach to frequent inspections of the data center is vital for a proactive operational mindset. Remember, you will get what you inspect, not what you expect!
5) Change Management Procedures
Is there a formal procedure for the implementation of change? Formal change management processes should be the norm and one of many functions to mitigate human error in a mission critical environment.
6) Data Center Infrastructure Management
Do all critical infrastructure systems have a comprehensive monitoring system(s)? Ask your data center provider how the DCIM systems are configured and what your level of visibility is into them. Properly configured monitoring systems can provide immediate notification of changes in the critical system’s infrastructure before they become potential issues and can be used to track and report on adherence to Service Level Agreements (SLA).
7) Infrastructure Capacity Management
Has your data center provider developed a method for infrastructure capacity management? Improper capacity management is a common cause of outages in a data center and a leading cause of cascading events. They should have something that has the following basic elements
• Accurate measurement of all electrical and mechanical loads
• As close to real-time measurement as possible
• A model to be followed when provisioning capacity
8) Lifecycle Strategy
A lifecycle strategy encompasses a preventative and predictive maintenance program combined with other strategies that are all focused on increasing the lifecycle and prolonging mean time between failure (MTBF) of the systems, equipment, components, and the data center as a whole. It will involve processes and strategies that include:
• A “replace before fail” strategy
• An equipment rotation strategy
• An equipment replacement strategy
9) Outsourcing of Operations/Facilities Teams
Make sure your data center provider does not outsource their operations and facilities teams. If they outsource the management of the majority of their maintenance program to a third party, they do not have control of the maintenance and lifecycle strategies that run the data center. This will inevitably impact reliability if the operations and facilities teams are not managed in-house.
10) Strategic Geographic Locations
Often, the locations of many data centers are selected on what is readily available, convenient, or what can be repurposed inexpensively. These are not always the best priorities concerning data center site selection. Make the geographic location of the data center provider and the potential for natural disasters a key factor in site selection.
Bryon Miller, Senior Vice President of Operations at FORTRUST.