From Small Boxes Came Great Longevity
Options for HP 3000 Transformation

Fine-tune transceivers and their heartbeats

Heartbeat-chartIs MPE/iX sensitive to heartbeat signals generated by network transceivers in LANs? We think it's having a performance impact on our 9x7. What can we do?

These extra heartbeats can be a drain of up to 15 percent on CPU.

If DTCs are involved, flipping a switch on the transceivers can resolve the CPU drain. After flipping the SQE switch to on, excessive IO activity stops — and with it, the excess CPU activity it causes.

If the SQE heartbeat on the 10BaseT transceiver is not on, you can get a high level of disk IO, because the system wants to log each of these events. The IO can be significant, up to a continuous 70-80 IOs per second. Doing a LINKCONTROL @; STATUS = ALL can turn up heartbeat losses recorded since last reset. Turning on the transceiver's SQE heartbeat corrects the problem.

Somewhat randomly, we get a handful of heartbeat losses, carrier losses and transmit errors (same number of each). We can go for days without seeing any. We replaced the MIO card but it had no effect on these occasional glitches. I’d like to replace the transceiver because we see no other problems anywhere on our network. What are my chances of successfully doing this hot?

You can swap transceivers hot. In fact, Replacing the transceiver solves the problem.

What diagnostics and network reports should I trace to discover a transceiver's heartbeat problems?

SQE heartbeat loss can lead to all sorts of network and system performance problems. It's usually caused by a defective transceiver or a transceiver that has not been configured correctly  The first thing to do is check for heartbeat losses on the LAN card. Heartbeat losses on the system card cause slow network throughput, most notable in large file transfers. The LINKCONTROL command can show you if the transceiver is not providing SQE heartbeat.

As shown below, you should see heartbeat losses of 0 or very close to 0.

:linkcontrol @;status=all
Linkname: DTSLINK Linktype: IEEE8023 Linkstate: CONNECTED
Physical Path: 56/56
Current Station Address: 08-00-09-98-18-D3
...

Trans late collision 0 Size range errors 0
802 chip restarts 0 Receives dropped 0
Heartbeat losses 0 Receives broadcast 6605
Receives multicast 0

Lack of SQE heartbeat on DTCs can cause system performance problems and is not reported by the LINKCONTROL command. A DTC ‘complains’ to the host system that it is missing SQE. The HP 3000 will log the heartbeat loss events to special log files stored on LDEV 1. These log events occur continuously, resulting in an IO bottleneck on the system disk. On some systems you can actually hear the system disk getting constant usage.

How do you diagnose if you are subject to this problem? Frequently the process that is logging the errors appears as the top DISC consumer in SOS or Glance/iX. Or a system process will continually appear in a list of active processes as seen in the :SHOWQ command:

:showq;active

DORMANT RUNNING

Q PIN JOBNUM Q PIN JOBNUM

A 39
C M163 #S9136
C M183 #S9140
D U189 #J6036

A stack trace of PIN 39 would point to a performance problem. Not only is this process logging the heartbeat loss events, it is forcing a post of the records to disk immediately via FCONTROL. This is where the performance problem lies.

Comments