Marking Time To Recovery: No Mean Feat
June 27, 2012
MB Foster led users through 45 minutes of MTTRO fundamentals this afternoon in a webinar. That's Mean Time To Recovery of Operations, or the amount of effort measured to get an IT operation back online after a disaster like a hurricane. Here in Texas, the state's coastal cities including Houston were once bracing for the arrival of Hurricane Debby, which was predicted to make landfall later this week before it turned back out to the Atlantic.
MTTRO "really has to do with what it takes to get back in operation after the disaster occurs," said MB Foster's CEO Birket Foster. "Also, what the skill sets are for building the new environment." Communications between team members are one issue to consider, now that company operations are often spread out geographically.
"One of my favorite stories about a disaster recovery team was the one that was getting on plane to fly from New Jersey to their Colorado disaster recovery site," Foster said. "On check-in, the communications specialist was told that the test scenario was 'You're on vacation in Mexico and unavailable.' So he was told to go home, and the cross-training was then put to the test."
With HP 3000s often running in mission-critical mode, plans for DR are crucial. There are many items to track, starting with an estimate of what it will cost to recover. A good MTTRO plan calcuates the length of time that each business unit can survive without a system. In other words, estimating the pain and cost of each of the following timeframes: the increasing impact of disruption for the first hour offline; after 4, 8 and then 12 hours offline; then after one full day offline, then after one week offline.
- Equipment (computers, phones, payment devices)
- Vendors – Hardware & Software – specs and versions, license keys
- Hot and cold standbys
- Have user procedures in a document that is current
- Each recovery scenario depends on the event
- A communications plan is everything
- Know who needs to be notified on System Management Team
- Who declares the emergency, and who executes the plan?
- What is the phone tree process for staff notification?
- Who is the media contact?
- What other vendors, customers, and service agencies need to be notified?
- Where will the recovery site be – the same or different for each scenario?
- What is integrated with each application?
- Are the interfaces real time or batch (asynchronous)?
- Can the application be made operational without the other apps (standalone)?
Foster's company, being a services provider as well as a software company, thinks through all these issues with clients. It's a timely issue here in the US during storm season. Unlike Debby, it's not a subject that's going to blow away, so to speak.
One of the biggest hurdles for one manager attending the webinar was keeping information current. "We have to research everything, to make sure it's current from the last DR test," said Wendy Durupan at Harvard Pilgrim Health. "We test twice a year."