Heartbeat at the center of CPU boost
June 15, 2018
Newswire Classic
By Gilles Schipper
The activity light on the 3000's LDEV 1 was abnormally high, and we noticed very sluggish response time, even though only the console was signed on and no batch jobs were executing. Having no idea what the problem was — and absent any tools such as Glance to shine a light on the situation — we began to revert to the previous configuration, software and hardware.
Only a week later, with some analysis of NM log files, were we able to establish what was going on. The performance problem was related to the 3000's transceivers. SQL heartbeat was disabled for all of them. The result was that the CPU was being inundated with an overwhelming amount of IO requests in order to log the missing heartbeat event in the NM log file.
This unnecessary and voluminous IO was enough to bring the system to its knees — even absent any other activity. In today's HP 3000 environment, this serious CPU wastage problem can be overlooked, because faster CPUs could render the problem relatively less noticeable. But I would venture to guess that there is a lot of the "wasted IO" that is affecting a large number of HP 3000s out there.
Fortunately, there is a very simple way to recognize whether the problem exists, and also a simple cure. To determine if you have this problem, simply type the following command and look at the reply that follows:
:listf [email protected],2
ACCOUNT= SYS GROUP= PUB
FILENAME CODE ------------LOGICAL RECORD------- ----SPACE----
SIZE TYP EOF LIMIT R/B SECTORS #X MX
H000000A* 1W FB 5 66010 1 256 1 *
H000000B* 1W FB 0 66010 1 0 0 *
H0909A5A* 1W FB 5 66010 1 256 1 *
H0909A5B* 1W FB 0 66010 1 0 0 *
H13ECEEA* 1W FB 5 66010 1 256 1 *
H13ECEEB* 1W FB 0 66010 1 0 0 *
H15F669A 1W FB 5 66010 1 256 1 *
H15F669B 1W FB 0 66010 1 0 0 *
HASTAT NMPRG 128W FB 347 347 1 352 1 8
HAUTIL NMPRG 128W FB 424 424 1 432 1 8
HP32209B PROG 128W FB 15 15 1 16 1 1
Notice the OPEN files (the ones with the associated asterisk suffixing the file name) that are 1W in size. There are two such files associated with each configured DTC, file name starting with the letter H, followed by six characters that represent the last six characters of the DTC MAC address, followed by the letter A or B. The EOF for these files should be 0 and 5 for the respective "A" and "B" files.
Otherwise your CPU is being subjected to high-volume unnecessary IO, requiring CPU attention. The solution is to simply enable SQL heartbeat for each transceiver attached to each DTC. This is done via a small white jumper switch that you should see at the side of each transceiver.
Voila, you've just achieved a significant no-cost CPU upgrade.
There is also another method of eliminating this excessive CPU overhead that involves using NMMGR to uncheck as many logging events as you can for each DTC, revalidating and rebooting.
But the SQL-heartbeat enable method is a surer bet.