Emulator: how far it goes, and what's next
Operations and applications get watched and tracked in emulation efforts

How infrastructure survives heated times

Over the past 24 hours I feel like I've been living the work life of a 3000 IT manager. We've had telecomm outages here, the kind that can mean lost business if it were not for backup strategies. Unlike the best of you, we don't have a formal plan to pass along in a disaster. Today's not really a disaster, unless you count the after-hours pleasure we hope to savor from Spurs basketball.

The FinalsIn a lock-down IT design, writing captures what to do when a telecomm service winks out dark. Our broadband provider is ATT, with an 800-number repair line to call. We poked at that twice today for one of our landlines, now without a dial tone since yesterday afternoon. There's a different repair number for the Uverse Internet service -- and also the world of IP everything else, since our downed data line means not only no fast Web, but no San Antonio Spurs NBA Finals basketball in about 2 hours or so.

Consolidation to a single provider promises savings, but also a single point of failure. Coordinating service between two arms of the same company? Well, that's not an automatic thing anymore. Meanwhile, the cloud-based IT promised by HP and others just pulls all of this recovery farther away from your affected IT shop.

Genesys-Meeting-Center-8About 10 days ago, MB Foster gave a thorough primer on the issues any company faces in keeping its disaster recovery process up to date. There's old tech (phone trees to spread the word on outages) as well as new elements like measuring the Mean Time To Recovery of Operations. MRRTO can help you decide where to put the effort first in a downtime event. Foster can help you ready for the calamity with a thorough inventory of what's running, something that CEO Birket Foster says too many companies just don't have up to date.

"You look at the different processes in your company and figure what's critical to keeping the business alive," Foster said in a June 5 Wednesday Webinar. "It comes down to understanding if there's a cluster of applications which work together, so you have to bring them all up together at the same time," he said. A DR plan must identify key users -- old tech, like keeping up to date with user cell phone numbers, so they can be notified.

"Hardware is usually not the problem here," Foster said. "That said, there was a vendor in the HP 3000 community who had a board go bad on their 3000. It took them 13 days to get the other board in and back up, and then into recovery. It was mostly about sourcing the right part. They didn't have good connections in that area." Then there was also the matter of getting competent resources to install the board.

Tomorrow MB Foster offers another Webinar, since it's a Wednesday. Gods of Data Quality examines Master Data Management (register for free), the MDM that "ensures your company does not use multiple – or potentially inconsistent - versions of data in different parts of its operations; understanding the concept of 'one version of the truth.' "

Each one of these Webinars gives me plenty to think about and try to plan for.

We're feeling some pain today in our little micro-sized shop, but it hasn't cost us business up to now. We're done what Foster advises: knowing what is running in your system lineup through an inventory. but that knowledge is in my head today, and if I swerved to avoid a texting driver and got myself the ER, my partner or a backupn helper wouldn't know how to deliver this news story to you, even if I'd written it in advance. What do you do when your broadband pipe goes down and stays down for awhile?

"This is a business problem, not an IT problem," Foster explained. The trenches-level repairs are on the IT lines, but the stakes are up at the boardroom level and in the finance officer's purview. That increases the pressure on IT, especially if the economies of curtailing support have been demanded from the CIO or CEO. In a personal example, just last week I toted up savings of dropping a hot-spot wireless feature on my mobile phone account. It's there when Wi-Fi can't do the job. It seemed costly at $25 monthly on a micro-business budget. Hot-spot I'd only used outside our offices on travel could be cut out, right? To pay more more crucial IT services, like website renovations. There's always something.

Except that for the last 24 hours, that hotspot off an iPhone 4GS has kept the Newswire's email and Web blog services online, right here in our offices. (It's not effective to have to go to a coffee shop to do secure Web work, but it's better than nothing.) Have you been forced to economize, debating over dropping a service contract or support agreement you rarely use? Or been told to drop? The finesse is in keeping these DR lifelines intact, ready for the day of disaster. The more you know in a formal plan, the more professional your respose looks to the executives in charge.

ATT brings everything into our offices now. 25 percent of our email, and all through their lines. 100 percent of the bandwidth for everything on a wire, including the TV. Our landline numbers, the ones which rarely ring anymore in the era of email but always can open our door to new business. 512-331-0075 has been in the public eye so long that a transition to a cell-only number seems unthinkable. We pay for extra support and maintenance on these relics -- our headquarters is smack in the middle of some of the oldest and messiest copper in Northwest Austin.

As I write, the second ATT truck of the afternoon cruises our street. Matt (they all have names you should use) is unsnarling and fixing a network pedestal at the property next door. This hub controls our telecomm and that of a half-dozen other addresses in the area.

I'd call these residential issues -- our office is in the midst of a a stately 40-year-old neighborhood in one of Austin's oldest high tech corridors. But when I register our trouble ticket for the phone llne, ATT says in its recording we are a Major Business Account. I don't question that designation, because it gets us to the head of the line with a human being. Broadband service, sadly, doesn't enjoy this distinction. ATT considers us consumer-grade customers, even as we work with an 18GBit download speed.

Take this checklist and answer honestly to see how much you must do to survive calamity.

  • Did you recently cancel support for software still crucial to the business, but now on a "declining" platform of the 3000?
  • Is your support provider working within a Service Level Agreement -- so you know how much the "increasing impact of a system costs" after an outage of one hour, or four, or 8 or 12 or a day or a week? What's the pain and cost of each of these downtime periods?
  • When you place a support call, how soon to talk with an agent, human being or expert on your system?
  • Do you have redundant hardware in place for when a computer does offline -- and is it hot-standby, or not?

Perhaps most importantly, how long has it been since your DR plan has been tested? By a test, I don't mean the last time you needed it to work. Those reports are costly. This is a controlled event that yields a lot of documentation on the success of your DR-MTTRO plans. Foster pointed this out

Here at the Newswire we're light on our docuementation. I could write out for my partner how we recover from calamity internally -- the locations of our backups, the process to restore, the way to transfer a full backup onto reserve hardware. Who we call when we cannot resolve it ourselves. How the telecomm is supposed to work. We have religion to do that today, but you can't just drop that kind of information into the hands of your best sales person, chief muse and dreamer, or even a veteran office manager who's unfamiliar with the fundamentals of problem resolution.

This can happen inside a 3000 shop, one with other environments like Linux and Windows at work. Our partner and friend Alan Yeo had a UPS calamity with his power last month, and it was five days before the affected 3000 went back into service. This is an organization with more than 30 years of 3000 and IT background that presumed a UPS could keep a system online -- instead of permit the server to be fried, while other computers all around escaped that fate.

And so, Alan is preparing an article entitled, "Do you want fries with that?" in his set of cautions. Electricity is about the only essential service that hasn't rolled over on us over the last week. Without it there's the coffee shop, alternative business allies nearby (like our friend Candace's personal coaching service). We called her as a backup to the Spurs game tonight, too -- just before ATT's broadband repair succeeded after six hours of heroic effort.