The Data Center Tools that Improve Operations and Help Reduce Risk
We conclude our article series on designing data centers for observability, resiliency, and better operations. Last week, we examined how observability has changed to support digital infrastructure. This week, we’ll discuss the human element that needs to be considered when creating observability in digital infrastructure – and how tools can improve data center operations and reduce risk.
When working with critical infrastructure, it’s critical to keep people and assets safe. Reducing risk in key facilities includes prevention and identifying and reacting to critical incidents quickly. Condition-based maintenance helps determine the risk of asset failure before it happens. Multi-level security applications ensure that the site has the protection and control needed to minimize downtime and intrusion threats.
To improve operations and reduce risk, look for tools that include:
- Integrated asset and life safety protection — BMS, Fire, Security
- Situational awareness and threat management
- Physical security — multi-layered perimeter, access, and DVM
- OT cybersecurity assessment and solutions
- SPoG (single pane of glass) to manage alarms and critical incidents
The following examples are potential real scenarios and stressors impacting data center management and the tools which can be leveraged to overcome these challenges.
Cybersecurity Incident: Identification and Eradication of Potential Cyber Incidents
Observe
Enhance situational awareness to identify cyber threats proactively
- Conduct cybersecurity site assessment
- Monitor and use a remote management system to identify anomalies and send notifications about possible malware
Investigate
Optimize response time with predefined collaborative protocols
- Confirm possible malware
- Follow a predefined incident response protocol which includes notification to critical stakeholders
Resolve
Recover and minimize system downtime with predefined processes and procedures
- Preserve forensic data
- Contain affected systems from further damage or data loss
- Eradicate the malware
- Restore service
- Provide a detailed incident report consisting of lessons learned
Active Threat Situation: Multi-level Security Helps Promote Zero Tolerance on Security Issues
Observe
Enhance situational awareness to identify issues quickly
- Alert security operator to an active threat
- Observe the area using integrated maps and video
- Initiate SOP on incident workflow, including lockdown of the facility and informing the security manager
Investigate
Optimize response time with collaborative workflows
- Coordinate with the security team to investigate and confirms threat level
- Initiate SOP with security officer and manager
- Call emergency services and evacuate the area as needed
Resolve
Enforce procedures to reduce the duration of the evacuation
- Drive collaboration among the emergency services and the security team to contain the threat
- Follow the SOP workflow to repopulate the area safely
- Monitor the environment to promote continued safety
Running a data center in today’s world is an increasingly complex task. New technology, changing regulations and new demands have left many companies feeling like they’re not keeping up.
A Fire Has Broken Out: Faster Response and Reporting
Observe
Identify risk and see the big picture faster
- Use advanced detection technology to help identify and prevent critical incidents early
- Initiate an incident workflow through a fire alarm
- Assess the situation with the video system and trigger an automated response – smoke extraction, pressurization
Investigate
Centralize collaborative command with record- keeping and traceability
- Coordinate incident activities through the response team
- Initiate specific workflows for shelter, evacuation and muster, as required
- Leverage centralized collaborative command with record-keeping and traceability
Resolve
Conduct remote monitoring and collaborative incident response
- Monitors the situation remotely
- Enable traceability and provide an auditable version of events for investigative purposes.
Reducing Stress, Improving Visibility and Management, Retaining Talent
Driving operational efficiency can reduce stress and make data center professionals more efficient. Running a data center in today’s world is an increasingly complex task. New technology, changing regulations and new demands have left many companies feeling like they’re not keeping up.
One great example of helping reduce employee stress and increasing technicians’ feeling of empowerment is by making troubleshooting fundamentally easier.
Final Thoughts and a Look into the Future
The future will only become more connected. Organizations that deliver digitally optimized occupant experiences will likely establish a long-term advantage in capturing and retaining customer and employee loyalty.
Whether providing new IoT solutions, building a new hyperscale data center, or investing in the edge, be sure to work with designs that help businesses grow without adding additional stress.
To keep up with this shift, it’s important to work with partners with a digital-first mindset and deliver solutions that offer a true portfolio-level view of all digital assets. Whether providing new IoT solutions, building a new hyperscale data center, or investing in the edge, be sure to work with designs that help businesses grow without adding additional stress.
The people part of the equation is also essential to hire and retain top talent. Digital infrastructure will require new tools to help people stay efficient, safe and empowered and productive.
The good news is that data center professionals don’t have to embark on this journey alone. Leaders in the digital infrastructure are already building new, data- driven designs to help overcome some of the most frustrating challenges. To begin the journey, start by asking the right reflective questions about the business’s digital infrastructure and how people and key digital assets are being managed today.
Download the entire paper, “The Data Center Human Element: Designing for observability, resiliency and better operations,” courtesy of Honeywell, for exclusive access to additional use cases, tools to improve data center operations, and tips for finding partners that enable visibility and observability.