Article revised Nov. 2, based on HP responses
Yesterday's report on a critical patch for HP 3000s sparked immediate response from the user community, especially those experts in the 3000's internals. These experts believe the nature of the problem might require more than binary repair patches, to serve the long term needs of 3000 customers.
[HP disagrees, and its response is available in a Nov. 2 report]
HP intends to create these binary patches through 2010, as it said yesterday. OpenMPE advocates say they are concerned that these fixes will present a challenge to application developers who will need to integrate them into MPE/iX in the future. OpenMPE wants to do this work.
"We've done as much testing as we could get done," said HP's community liaison Craig Fairchild. "There has been some field testing, and a lot of in-house testing." He added that HP scanned its internal 3000 applications to test the abilities of FILECHEK, which finds Large Files.
Our report of yesterday may have been too cheery about the chance of hitting this corruption bug. Our Oct. 31 story estimated that in one case, users risked just an 800 million to 1 chance of hitting one of five bytes at the end of a sort of a 4-billion-byte file that could corrupt data. Stan Sieler at Allegro Consultants, a Resource 3000 partner, said it's not that rare.
We found it fairly easily. 133 file sizes from 2 to 32,766 bytes per record can directly encounter the problem ... but only for Large Files. (As it happens, 256 bytes isn’t one of them, nor is any power of 2.) We stumbled over one while doing a large sort.
Files of about 2 GB or more, and of any record size, can encounter the problem while being sorted, because HPSORT creates a scratch file whose record size isn’t identical to that of the input file — if it happens to create a file with one of those 133 file sizes, and that scratch file is 4GB or more, then it can run into the problem.
Most people don’t have Large Files ... they could be affected only if they happen to sort files that are bigger than about 2 to 3 GBs. (And, even then, there are only a about 152 file sizes out of 32,766 that might trigger the problem.)
Systems without Large Files are safe. Adager's Alfredo Rego posted to the HP 3000 newsgroup about an hour after HP announced the critical patches. Rego said his lab found the problem in August, after an earlier HPSORT patch introduced a new problem while not solving the initial bug.
Rego's exploration also raised the question of a more serious issue in the nature of HP's repairs, one echoed by MPE/iX veteran and MPE-education.com co-founder Paul Edwards.
"There's some strong issues here," Edwards said. "I'm concerned that these binary patches were not done in the source code. So HP didn't generate a General Release patch, which means it may or may not be tested on all three of the supported versions of MPE/iX. There was no beta patch, or anything."
[HP said in its Nov. 2 reply that the patches have both been beta-tested and General Released.]
While Rego congratulated HP on the speed of its 'tremendous efforts" in creating the patches, he also took note of the lack of notice about third parties' assistance in isolating the problems.
It appears that HP continues its policy of not acknowledging any of the help it has obtained from its hard-working partners. That’s okay, because I have reason to believe that this is forced upon our vCSY friends by HP’s corporate legal department. Boy, how I miss the days when I used to walk the floors of HP’s buildings while chatting with Bill Hewlett and Dave Packard in the early 1970s.
The Adager team described the problem with move_fast_64 to HP on August 8, 2007, after we discovered that the first HPSORT patch not only did not solve the problem but introduced a new one by using move_fast_64.
Craig Fairchild implies that only non-HP applications might be affected. What about HP applications? I believe that the “move_fast_64” procedure is used extensively by HP within MPE/iX. Has HP taken care of all the possible problems within MPE/iX?
Adager calls HPSORT programmatically. We detected, analyzed, documented, and reported the initial problems to HP during tests on our own HP 3000 systems as well as on some large customers’ systems.
The problems encountered by Adager customers were not caused by Adager code, but by HP’s code. Once the “latest” HPSORT patch was applied, the problems went away. The patch ID we tested is MPENX06, which is different from the patches mentioned by Craig. It seems that patch MPENX11 includes the fix for HPSORT.
Craig does not mention that the problem can exist while using simple calls using FREADDIR on files with bug-exposing characteristics. He does provide a nice technical description under the section on “MPE/iX OS millicode handling of long pointer access to large files.”
It’s not clear (to me, at least) whether the patches identified by Craig are already available as General Release or still in beta. We have not received either one of them, even though we generated the original bug report. Is HP confident that the problem has been completely resolved without any user testing?
Adager users are safe. Even when they may experience the problems that Adager exposes when calling HPSORT without the patch, Adager will leave the databases unchanged after reporting the problem.