« January 2019 | Main

February 13, 2019

Why a UPS FAIL let down a 3000's shield

Fail
Previously, when a pair of HP 3000s were felled in the aftermath of a windstorm which clipped out the power at Alan Yeo's shop, his Uninterrupted Power Supply in the mix failed as well. After a couple of glasses of merlot, our intrepid developer and founder of ScreenJet continued to reach for answers to his HP 3000 datacenter dilemma. Why did that UPS that was supposed to be protecting his 3000s and Windows servers FAIL once the power died? 

By Alan Yeo
Second in a series

Feeling mellower and with nothing I really wanted to watch on the TV, I decided to take a prod at the servers and see what the problems are. I decided I'd need input to diagnose the Windows Server problem, so that could wait until the morning. Power-cycled the 917 to watched the self-test cycle and got the error, did it again. (Well sometimes these things fix themselves, don't they?) Nope, it was dead! 

Google turned up nothing on the error. Nothing on the 3000-L newsgroup archives, either. I'd tell you the 3000 error code, but I've thrown away the piece of paper I had with all the scribbles from that weekend.

Where's a guru
when you want one?

I really wanted to get my 917 back up and running over the weekend, as it had all our Transact test software on it. Dave Dummer (the original author of Transact) was doing some enhancements to TransAction (our any-platform replacement for Transact) and we had planned to get some testing done for early the following week, to help a major customer.  

So it's 11:30 PM UK time, but it's only 3:30 PM PDT. I wonder who's still around at Allegro? A quick Skype gets hold of Steve Cooper, who with the other Allegroids diagnose within five minutes that the 3000 has got a memory error. The last digit of the error indicates which memory bank slot has the problem.

Okay, I'm not going to start climbing around the back of the rack at this time of night. I leave it until the morning, but at least I know what the problem is.

False Dawn

Pulling the 3000's memory card is no problem. Working out which of the five banks is bad takes a bit more work, but a bit of plug engineering and a couple of reboots shows that we have 64MB (2x32) of bad memory. No problem, plenty left, so remove it and reboot. Great, get to the ISL prompt, do a START NORECOVERY and go get a cup of coffee and a cigarette, and I’ll soon have this system back up.

SYSTEM ABORT from SUBSYS 143

SYSHALT 7,$0267

FLT DEAD

Oh, hell.  

Long Story Short (or another one bites the dust)

Okay, it's about time we cut this story short — although I am certain you want to read about someone else's trials and tribulations, even as I suspect you’re only reading to find out why your UPS is useless. Suffice it to say that the 3000's LDEV 2 had also been fried, which we replaced, then the DAT drive was dead, which was replaced, but was still dead.

So in the end, we decided our fastest recovery solution was to scrap the 917 and merge its data with a 918 that had a clone in the shop. It’s a choice which makes DR recovery a lot simpler, also one less piece of kit burning electricity, that should help save the ice caps!

So what got Fried? HP 3000, Dell Intel Server, one modem, one DTC 16 -- and of course the two APC UPS's that were supposed to be protecting everything.

Why? Given that the APC “Smart” UPS's had done such a wonderful job of protecting everything, the conundrum was why they hadn't protected everything. It was time to do some research on UPS's.  

It turns out there is a little bit of a clue in the three letter acronymn. The “U” stands for “Uninterruptible” not “Clean.”  I discover that there are two main types of UPS: the normal Line-Interactive. Everyone makes them, everyone's got one UPS like the APC Smart UPS. Then there’s the “On-line” ones. The major difference is that standard “Smart” UPS's (most of the time) feed a mains supply out to everything plugged into it. In contrast, the  on-line versions feed everything from an inverter 100 percent of the time.

But I hear you say (and as I thought) “My APC UPC filters the power, chopping down over voltage, boosting under voltage, and supplying power if the mains fails.”  Well the answer in classic 3000-L mode is, “Yes, but it depends.”  Now I'm no electrical expert, but I’ve worked up a layman's interpretation.

There’s something in the mix called Dirty Transfers.

Line Interactive UPS's do AVR, Automatic Voltage Regulation. Instead of going to battery during low or high input voltages, this sort of unit will use an Autotransformer to increase or reduce the voltage to a safe operating range without running on the battery. Within their stated tolerances, they can run almost indefinitely doing a number of things.

  • AVR Boost, where the UPS is compensating for a low utility voltage;
  • AVR Trim, when it is compensating for a high utility voltage.
  • If the voltage fluctuates outside a set range, or on some of them if the rate of change of the voltage exceeds a given threshold, then they will Transfer, using the battery power via an inverter. The UPS then monitors the AC supply and when it deems it is back within tolerance it transfers back to the mains supply.  

It is this Transfer Time (TT) that can cause some problems. Such as those at our shop.

05:29 PM in Homesteading, User Reports | Permalink | Comments (0)

February 11, 2019

Making a UPS Light Up a 3000

Lightning_bolt_power_stripEditor's note: A recent message thread on the 3000-L mailing list and newsgroup reported on attaching an Uninterruptible Power Supply (UPS) to a 3000. The question came up when an MPE/iX manager asked about hooking up a UPS to an emulated 3000. While that is proof enough that the Charon emulator is working in the field, the question still covered HP's MPE hardware. More than five years ago Alan Yeo covered this ground for us in a lively and informative two-part feature.

Intrepid veteran developer Yeo of ScreenJet in the UK had a pair of HP 3000s felled, despite his sound strategy of using an Uninterrupted Power Supply in his IT mix (or "kit," as it's called in England). Here is Yeo's first installment of the rescue of the 3000s which logic said were UPS-protected. As Yeo said in offering the article, "We're pretty experienced here, and even we learned things through this about UPS." We hope you will as well.

New UPS, sir! or "Would you like fries with that?"

By Alan Yeo
First of a two parts

"Smart UPS" now has a new meaning to me. "You're going to smart, if you're dumb enough to buy one" I guess this is one of those stories where if you don't laugh you'd cry, so on with the laughs.

By the end of this tale, you should know why your UPS may be a pile of junk that should be thrown in the trash. And what you should replace it with.

A Friday in early June and it was incredibly windy. Apparently we were getting the fag end of a large storm that had traversed the Atlantic after hitting the US the week before. Sort of reverse of the saying "America sneezes, and Europe catches a cold." This time we were getting the last snorts of the storm.

Anyway, with our offices being rurally located, strong winds normally mean that we are going to get a few power problems. The odd power blip and the very occasional outage as trees gently tap the overhead power lines. Always worst in the summer, as the trees are heavily laden with leaf and drooping closer to the lines than they are in the winter, when they come round and check them.

So this situation is not normally something we worry about. We are fairly well-protected (or so we thought) with a number of APC UPS units to keep our servers and comms kit safe from the blips and surges. The UPS units are big enough so that if the power does go out, we can keep running long enough for either the power to come back -- or if we find out from the power company that its likely to be a while, for us to shut down the servers.

We keep all the comms kit, routers, switches, firewalls and so forth on a separate UPS. This UPS will keep them running nearly all day, so that way we still have Internet access, Web, email and more, so can keep functioning, as long as the laptop batteries hold out.

The wind picked up during the morning and we had the expected a flick of the lights, and the odd bong, ping, and beep from the computer room as the UPS's responded to the odd voltage fluctuations and the momentary outages. Around 12:30 we had a quick sequence of power blips, followed by a couple of minutes of power gone, at which point the UPS's started bleeping loudly as they took the load. This is normally the trigger for me to wander in there and just do a visual glance at battery levels. I was stood in there as the power came back and was watching as the server's UPS came back normally. Then the comm's UPS flashed all its lights, beeped and went dead!

It's not dead, its just
sleeping after a long squawk!

Humm… First I thought it must be the overload switch, so disconnected all the load, grovelled around behind it and pressed the reset switch. Nothing. So I disconnect from the mains, reset, power it back on, nothing. Check the fuse in the plug, all okay, its still dead. Dig out the APC manual, whose symptoms say "don't use, return to your supplier for service." 

At this point the power goes completely for 10 minutes, and as I can see that the server UPS batteries are already half empty (or half-full if you're an optimist). "They must have been taking more of a load during the morning than I thought," I say to myself. I decided it was time for a controlled shutdown of the servers, which I did. Now I was going to have to rejig the power cables, so that we could feed power to the comm's kit (which was now on a dead UPS) from the server's UPS. A couple of minutes of work commenced, to move their supplies to spare outlets on the APC Switched Rack PDU that is fed by the UPS. The PDU is a network-addressable Power Distribution Unit, one that can power up/down individual power outlets, and thus we can remotely shutdown or reset the servers if needs be. 

So at this point the power comes back, and I power up the comm's kit, leaving the servers off. Decide I'll go for lunch, let the batteries recharge a bit, and make sure that the power is staying on before I restart the Servers.

Lunch passes, with a glass of Merlot. 

Now the power seems to be stable, so it's back to the computer room to bring up just the essential servers. Our main HP 3000 test server. A Windows mailserver, and a Windows file server that also handles our VPN connections (because everyone works remotely now). 

I'm in the middle of this when the power goes out again. I look at the PDU which tells me that we are drawing 3 amps (240v * 3 = 720 watts) = about 10 minutes worth on a half-charged 2200VA UPS.  Not worth it, so I shut the servers down (but I don't throw their power switches).

Fireworks!

At this point the power comes back and stays on for about five minutes. There's me standing there trying to decide what to do, when the power goes off again, and then comes back. At which point the sole remaining UPS goes BANG! It flashes its lights a bit whilst beeping manically, and then goes dead. The room fills with the smell of over-heated insulation, so I pull the UPS power plug.

Okay, "Sod this for a bunch of Soldiers," thinks I. Was going to finish early that day to help some friends set up for a weekend Charity Clay Shoot. "I'll go now and come back later -- when hopefully the wind has died down and the power is back to normal -- and then pick up the pieces."

Back in the datacentre at 8 p.m. and the wind is gone, with power back to normal. Okay, should just have time to get everything working before dinner. Play with the UPS for 10 minutes, but it's dead. So we are going to have to "walk the tight rope without safety harness or net" and run everything direct from the mains. 

Not exactly completely unprotected computing, because when we had had the new office wired 18 months ago, we installed surge protection on the mains supply. Its like a couple of cartridges that sit next to the distribution panel that absorb a surge, decaying in the process, until the point they need replacing. They have a status indicator on them telling you if they need changing, but they were showing green, so I thought I'd risk it for a few days, until we could source a new UPS. 

Why do these things always hit at a weekend?

Comms come back okay, although I noticed that an old dial up modem was dead that was still hooked up for dire emergency remote access if Internet access failed. Okay, now for the servers: power up the Series 917 and let it start its self test check (which takes ages, and lots of memory); power up the Series 918 (it does its memory tests much quicker); power up the Windows 2008 file server and a Windows mail database server. Plus, an older Windows 2003 server that still ran the SMTP software, which should have been moved to the 2008 server, but hadn't because we had never got around to it.

The HP 3000 918 comes up clean, the Windows 2008 server comes up, the Windows mail database server comes up. But HP 3000 917 is downed with an FLT error, the Windows 2003 Server is looping around boot start-up into Windows launch, then straight back to boot start-up. Wonderful! Sod it, go and have dinner and decide if I'm coming back later.

08:33 PM in Homesteading, User Reports | Permalink | Comments (0)

February 08, 2019

What can a 3000 do to talk to a modern UPS?

SmartUPS
Michel Adam asks, "How can I install and configure a reasonably modern UPS with a 3000? I'd like to use something like an APC SmartUPS or BackUPS, for example. What type of signaling connection would be the easiest, network or serial?"

Jim Maher says

First you need to find out what model 3000. Listed on the back will be the power rating. Some of the older ones use 220V. Then you can match that with a proper UPS.

Michel Adam explains in reply

This HP 3000 is an emulator, i.e. a 9x8 equivalent or A-Class. I guess a regular "emulated" RS-232, or actual ethernet port would be the most likely type of connection. In that sense, the actual voltage is of no consequence; I only need to understand the means of communicating from the UPS to the virtual 3000.

Tracy Johnson reports

While we have three "modern" APC units each with battery racks four high, they also serve the rest of the racks in our computer room. Our HP 3000 is just a bigger server in one of those racks. Each APC services only one of the three power outlets on that N-Class. Their purpose is not to keep the servers "up" for extended periods, but to cover for the few seconds lapse before our building generator kicks in in case of a complete power loss.

As far as the UPS talking to our HP 3000 serial port, we didn't bother. Our APC units are on the network so they have more important things to do, like send emails to some triage guy in Mumbai should they kick in.

Enhanced, or not?

In the history department, Hewlett-Packard had its labbie heart in the right place just weeks before the vendor canceled its 3000 plans. We reported the following in October of 2001

HP 3000s will say more to UPS units

HP's 3000 labs will be enhancing the platform to better communicate with Uninterrupted Power Supply systems in the coming months. HP's Jeff Vance reports that the system will gain the ability to know the remaining time on the UPS, so system managers can know that the UPS will last long enough to shut down my applications and databases and let the system crash. Vance said that HP has scheduled to begin its work on this improvement—voted Number 8 on the last System Improvement Ballot—in late fall.

Late fall of 2001 was not a great time to be managing future enhancements for the 3000 and MPE/iX. The shortfall of hardware improvements and availability has been bridged by Charon. Adjustments to MPE/iX for UPS communication have not been confirmed.

05:53 PM in Hidden Value, User Reports | Permalink | Comments (0)

February 06, 2019

Wayback: MPE's Computer Scientist Expires

Kick Butt Poster

Wirt Atmar conceived and lead The World's Largest Poster Project (shown above) with the help of hundreds of volunteers on a Southern California football field.

Ten years ago this week the 3000 community was reminded of its mortality. Wirt Atmar, founder of AICS Research and the greatest scientist to practice MPE development, died in his New Mexico home. Wirt was only 63 and demonstrated enough experience in the 3000's life to seem like he'd been alive much longer.

Atmar died of a heart attack in his hometown in Las Cruces, NM. It was a place where he invited everyone to enjoy a free enchilada dinner when they visited him there. He once quipped that it was interesting to live in a state where the omnipresent question was about sauce: "Green or red?" He gravitated to new ideas and concepts and products quickly. Less than a month after Apple introduced the iPhone, he bought and tested one, praising its promise even as he exposed its failures from the unripened state of its software to the cell signal unavailability.

If I go outside and stand under one specific tree, I can talk to anyone I want. In only one week, I have felt on multiple occasions like just heaving the phone as far as I could throw it -- if it weren’t so damnably expensive. The iPhone currently resembles the most beautiful cruise liner you’ve ever seen. It’s only that they haven’t yet installed the bed or the toilet in your stateroom, and you have to go outside to use the “facilities” — and that’s irritating even if the rest of the ship is beautiful. But you can certainly see the promise of what it could become.

He was not alone in predicting how the iPhone would change things, but being a scientist, he was also waiting on proof. The postings on the 3000-L mailing list were funny and insightful, cut sharp with honesty, and complete in needed details. A cruise through his postings on the 3000 newsgroup stands as an extraordinary epitaph of his passions, from space exploration to environmental science to politics to evolution and so much more. He was a mensch and a brilliant polymath, an extraordinary combination in any human.

Less than 24 hours before he died, Wirt posted an lively report on migration performance gains he recorded after moving an MPE/iX program to faster hardware running Linux. It was an factual observation only he could have presented so well, an example of the scientific practice the community loses with his passing.

One of the 3000 founders who was best known by his first name, Wirt was respected in the community for his honest and pragmatic vision of the 3000's history and potential, expressed in his countless e-mails and postings to the 3000 newsgroup. But alongside that calculating drive he carried an ardor for the platform.

Wirt was essential in sparking HP's inclusion of SQL in IMAGE, a feature so integrated that HP renamed the database IMAGE/SQL. In 1996 he led an inspired publicity effort that brimmed with a passion for possibility, conceiving and leading The World's Largest Poster Project (shown above) with the help of hundreds of volunteers on a Southern California football field. He quipped that after printing the hundreds of four-foot rolls of paper needed for the poster, loading them into a van for the trip to California represented "the summer corporate fitness program for AICS Research."

09:25 PM in History, Homesteading | Permalink | Comments (0)

February 04, 2019

Long-time MPE licensees leave dates in dust

Date book
I went to a birthday celebration for Terry Floyd yesterday as part of a Super Bowl party. You may begrudge them the kudos, but congrats to the Pats, who once again executed like the MPE applications still running this week in businesses around the world. Not flashy, like MPE, but every day brings no surprises. That's a very good thing for enterprise computing, and always has been.

Floyd's turned 70 -- he’s the guy who started The Support Group here in Austin to serve MANMAN 3000 customers. One of those customers was in town to celebrate. Ed Stein spent years managing MANMAN at MagicAire, a Carrier subsidiary.

That corporation is still using MPE, even after Ed has gone. He’s moved into the interesting fields of independent support and consulting on MPE. He mentioned he's available to the community's 3000 owners looking for MPE talent. Along the way he's developed his experience on the prospects for keeping dates nine years from now in MPE.

It was Stein's intentions for prepare for the 2027 date keeping changes that led several companies to spin up services and strategies for date-keeping in 2028 and beyond. What was mumbled about in private became more public offerings and strategies. During a conference call among MANMAN managers late in 2017, Floyd and others talked about how much work it will be to keep dates straight in an era HP never planned for.

Stein says that in his travels though the community he’s still running into many a 3000 user who’s got no idea their OS will stop making accurate dates in less than nine years. He also made reference to Beechglen and its 2028 patch service. Like everyone else who's using HP's MPE source code licenses, Beechglen cannot sell a product to patch MPE/iX. HP was never going to sell permission to create patched versions of MPE/iX.

Seven companies paid HP $10,000 each to become the source code licensees about nine years ago. At the time, the 3000's operating environment felt like a long shot to feel its age and forget its date-keeping skills. The server was 18 years away from a date that no working MPE server would ever see, right?

Don't look now, but 2027 is gaining on the community. Floyd was one of several developers who identified the scope of the work to make an app like MAMAN ready for the year 2028.

Some customers will get readiness for 2028 by becoming 3000 support customers. Any support company using the MPE source must package the repairs and improvements they develop as support offerings. There are a half-dozen more companies with source capabilities for MPE/iX. Getting a relationship in place with them will be on some to-do lists for 2019. Even the companies without a clue about date keeping will eventually catch on to where the correct tomorrows are going to come from: solutions off the support bench.

09:45 AM in Homesteading, News Outta HP | Permalink | Comments (0)