Zero Touch: Maintaining Server Uptime in Remote Locations

What if you are a business that doesn’t have the luxury of a fully staffed data center with a large IT staff? Edge computing is moving servers into austere locations where it’s impractical to have an IT person on site due to the sheer number of sites involved. The world outside of the data center is a harsh and unforgiving place for functions that depend on the cloud, with businesses losing money on downtime due to loss of communications or equipment failure.

Distance is one factor driving high-availability solutions into mainstream usage. Flying an IT person out to a ship in the middle of the ocean is an expensive service call with hours of downtime accumulating before the problem is fixed, as Telford Offshore found in the management of its oil and gas construction fleet.

“If a power supply failed on a construction ship and a server went down, they put somebody on a helicopter and flew them out there to replace the power supply, which is $30,” said Jeff Ready, CEO of Scale Computing. “But it cost $100,000 to get that that guy out there and back.”

Scale Computing’s solution for Teleford’s ships was the HC3 virtualization platform, clustering together multiple commercial-off-the-shelf servers and switches into a single machine. If one part of the cluster fails, HC3 “self-heals” by automatically shifting functions to the remaining resources in the cluster so applications continue to run without human intervention or the need for an expensive onsite visit on the high seas.

With HC3 clusters deployed across its fleet of ships, a failure notice now gets sent to a centralized dashboard operated at a Teleford Offshore facility with a standard repair order generated. “When the ship comes back to port, somebody walks on board, puts in a new power supply, and they’re done,” said Ready.

Distributed Retail Environments

Teleford Offshore is an extreme example of distance between a centralized IT organization and distributed computing resources, but there’s just as much interest and need in high-reliability solutions on dry land for retail, banking, manufacturing, and health care. These businesses can have numerous distributed offices located at the edge of the network, sometimes in areas without reliable broadband connectivity for cloud operations.

A Scale Computing enclosure for remote IT. (Image: Scale Computing)

Jerry’s Food, a national grocery store chain, used a legacy virtualization system for its IT infrastructure before switching over to Scale Computing. With 50 retail, grocery, liquor, and hardware store locations in in Minnesota and Florida and only five full-time IT members, Jerry’s Foods needed an on-site solution that didn’t require an IT person at every store. The gear had to be plug-and-play, remote manageable by IT staff, and highly reliable with near 100% uptime. If the EBT food stamp transaction system goes down at a grocery, the business impact to Jerry’s would be measured in dollars per minute with the potential for tens of thousands of dollars in losses if down for any length of time.

A computer problem at a Jerry’s location outside of normal office hours is not a problem anymore with, the onsite HC3 cluster simply shuffling processes across remaining servers and notifies Jerry’s IT department about the hardware issue, which can be dealt with during normal business hours.

“If one or two bits go down, you can wait until Monday morning or whenever someone rolls in,” Ready said. “Or you can do break-fix in the same way that Apple would if your phone dies, you send out a new (server) to the store, plug it in, it will talk to the configuration portal, download its configuration, and join what’s already there. You can take the dead one and ship it back or hit it with a hammer and throw it in the trash, but you don’t have to do a truck roll. That’s a tremendous cost savings.”

Scale’s solution has resulting in Jerry’s seeing an estimated 50% decrease on time spent managing IT infrastructure and a reduction in cost per site by over 50% over a five-year period as compared to previous legacy virtualization systems it used. Jerry’s has moved 20 of its 50 locations to Scale HC3 clusters and plans to move all of its stores to Scale in the future.

Moving from Satellites to Restaurant Kitchens

Florida-based OrbitsEdge started its business around a customized satellite solution to put off-the-shelf servers into space, but the company is finding opportunities closer to earth in the oil and gas industry, shipping and transportation, mining, and defense sectors.

“We’re creating the SatFrame solution for the harsh environment of space,” OrbitsEdge CEO Sylvia France said. “We’re being asked if we can take SatFrame and modify it to work in deserts, mountains, the Arctic, and the bottom of the sea.”

OrbitsEdge SatFrame is a zero-touch, zero-maintenance solution – there won’t be Geek Squad service calls on a satellite zipping through space hundreds of miles above the Earth. The company is also building in a zero-trust architecture security model, a concept very attractive to customers planning to deploy compute resources far away from the physical and electronic protection of a traditional data center.

Derivative versions of SatFrame will be tailored for field deployments on the ground and in the sea, utilizing the same general design of a standardized enclosure holding multiple servers sealed to protect electronics from dust and particulates, with radiator fins on the outside surfaces to dissipate heat. However, the company is also examining more mundane applications of its technology.

“One market for hardened servers brought to us is fast food kitchens,” said Sylvia France, CEO of OrbitsEdge. “There’s no space for computer equipment, it’s hot, it’s oily. One firm we talked to put the computer into the air conditioning ducts.”

Microsoft’s Undersea Hands-off Extreme

The quest for highly resilient computing resources isn’t limited to edge-style deployments. Microsoft’s undersea data center off the coast of Scotland, Project Natick, successfully operated hands-off for two years at 117 feet below the ocean’s surface.

“If we are sealing these things up and putting them on ocean floor, we can’t really service them the way we do land-based data centers, where if there’s a server with a fault in it we go and replace that server,” said Mark Russinovich, the Chief Technical Officer of Azure Cloud. “We’ve got to go to a fail-in-place model, where the servers degrade as they fail over time. Once the number of servers and container has dropped to a certain threshold, we need to replace that whole container. So we’ll deploy a new one alongside the old one, migrate the data and workloads, and then pull up the old one refurbish it, and then get ready to go again as another replacement.

The 864 servers had one-eighth the failures than found in land-based counterparts, an outcome Microsoft thinks is due to a combination of filling the data center container with dry nitrogen and less equipment jostling by the lack of human being. Nitrogen is less corrosive than oxygen and is commonly used to preserve food for long periods of time.

Microsoft is examining how the combination of a nitrogen-rich atmosphere and no people could be applied to land data centers. A hands-off data center with more reliable equipment would translate to lower hardware costs and fewer IT people on payroll. It’s reasonable to speculate that a future version of the Microsoft Azure Modular Data Center (MDC) might be delivered as a factory-sealed unit filled with nitrogen and designed with the only user-serviceable parts external plugs for power and data.