Shrink ray 3000 services: what you'd pay
Who's to blame when the lights go out?

Debugging the diagnostics

Photo by Mikhail Vasilyev on Unsplash

The Command Support Tools Manager (CSTM) replaced SYSDIAG as of MPE/iX 6.5. Managers who are keeping MPE/iX working here in 2019 rely on CSTM, just as they did SYSDIAG before it.

There's evidence out there that CSTM has problems while running on 6.5 MPE/iX systems. One well-schooled developer recently noted while trying to run CSTM on his MPE/iX system that the diagnostic told him on startup, "an error dialog could not be built to display an error."

The developer community suggested a few fixes for this problem with the diagnostic software. CSTM was ported onto the HP 3000 from HP-UX, so the repairs that CSTM itself suggested regarding memory (increasing it, removing processes, reconfiguring kernel memory limits) probably don't fit.  CSTM has a special page in the Hewlett-Packard Enterprise website devoted to the problem.

The developer at least had another 3000 running the same version of MPE/iX, a system where CSTM was starting up without a problem. One bit of advice suggests that while using console debug, "check out what a your working system looks like at the CSTM prompt when idle. Use psuedomap “XL” to get symbols from the libraries and program. Attempt to set some breakpoints near initial program launch."

Using DEBUG, the open heart surgery of HP 3000 management, is sometimes a required diagnosis. When your diagnostics software requires diagnosis, nothing but DEBUG will get the job done.

Much more detail followed on using DEBUG to discover what's failing in CSTM.

Stan Sieler had explicit instructions on how to run DEBUG to find a CSTM bug.

Run both programs side-by-side, using Debug, to find where they deviate. I'd avoid doing something easy like:  s 10000;tr   over and over, because that can land you in the middle of sensitive kernel stuff. Instead, I'd do smaller leaps (perhaps "s 100; tr") and when a new routine is entered, set a temp breakpoint at the exit and "c" to run for awhile. It's a tad more tedious, but safer.

You could also put a breakpoint at traphandler in each, and (with luck) only the failing one might hit it (e.g., if a file system problem occurred).  Sadly, the file system was not well designed with respect to error handling, and they use non-local escapes in many places, which are expensive and potentially risky.

Sometimes you can get the best/quickest idea by using Debug to put a breakpoint at 'printf' or 'fprintf' (since it's in C), or possible FWRITE ... trying to catch the program where it displays the error message.  

Sometimes, you'll find that a 'tr' will show you the calling function and it's clear that's the one that failed (if the output was generated inside the failing function).

E.g.:    function this_is_the_one_with_problems (...)
                ...  stuff...
                if an error was found then
                     printf ("oops")
Other times, you'll have to look at the caller (or back a few frames) and backtrack
in the code from that point.
E.g.:      rslt = this_is_the_one_with_problems (...);
              ... (possibly no extra code, possibly something like "if rslt < 0 goto got_error")
             if rslt < 0 then
                printf ("oops");