Putting Job Queues to Good Use
December 3, 2021
By Neil Armstrong
Our development environment at Robelle was quite unusual, in that we had a single job stream, which launched all of the necessary compiling and testing steps for each product. So if I wanted to compile and run the entire test suite for Suprtool, I just have to issue a single stream command, :stream jrelease.suprtool
If I wanted run the Qedit test suite, then I just have to stream the qedit jrelease with the command :stream jrelease.qedit
There’s a problem though: the individual job streams that test each product can only run single threaded. Each job must complete before the next one begins, so we have to keep the job limit perfect all the time. This also means that we couldn't run the Qedit and Suprtool test suites simultaneously.
This has always been a problem, as sometimes people or jobs alter the limit incorrectly, which means that multiple jobs would stream at the same time, causing test jobs to fail and results to be incorrect. This could mean losing a night’s “productive” testing, as the job streams are generally streamed in the early hours of the morning, after the nightly backup. So if you had just made a major change to a module of Suprtool, you wouldn’t know the impact of that change for another day.
Mike Shumko suggested that we try to implement jobq’s for our environment, to address this problem. Without knowing what jobq’s really were, I naturally volunteered for the job in the hope that I could alleviate this dependency we had on the job limit.
What I hoped jobq’s would do
Without even reading about jobq’s I thought they were a way to have job streams operate in specified queues, and thus be independent of jobs in other queues and the main job queue.
To start using jobq’s immediately, you need only do two things:
- create the jobq with the newjobq command, and
- change the !job card in your job stream to use the jobq that you built.
So to create the jobq for the Suprtool job streams I used the newjobq command. I only wanted one job to run at at time in this jobq so I specified a limit of 1:
Then I changed all of the job cards to have the “;jobq=suprtool” at the end:
Since I have MPEX, I used the MPEX %qedit command to make a global change to all jobs:
%qedit [email protected]@,append “;jobq=suprtool” “!job”
This will open each file that qualifies, and append to the job card line the jobq specification. Voila, Suprtool now could run in its own jobq independently!
There are some other useful commands associated with jobq’s:
:listjobq JOBQ LIMIT EXEC TOTAL HPSYSJQ 3500 8 10 SUPRTOOL 1 0 0 QEDIT 1 0 0 RANDMISC 1 0 0
The purgejobq command (:purgejobq suprtool) will allow you to remove any jobq’s that you’ve defined.
Showjob ;jobq will show you the jobs with the jobq in which they are running in. A detail line from that output would look as follows:
:showjob ;jobq JOBNUM STATE IPRI JIN JLIST JOBQ INTRODUCED JOB NAME #J4567 EXEC 10 S LP SUPRTOOL WED 11:48A JTEST01,USER.ACCT
You can also alter the jobq that a job is running in with the altjob command, by typing :altjob #j4567;jobq=qedit
Since it took me only a few commands to create and change all the jobq’s on our development system, I had everything changed to take advantage of the jobq’s in short order.
So I started testing running jobs associated with the Qedit and Suprtool test suites at the same time. I quickly discovered that each jobq requires that a slot be open in the global jobq in order for the job to run. I found an explanation in the newjobq command documentation:
“The global limit takes precedence over individual queue limits. That is, even if a jobqueue has a slot available, if the overall limit has been reached, jobs have to wait till one of the jobs finish or the global limit is increased. When a global slot becomes available, the next job is picked from among the eligible jobqueues (those which haven’t yet reached their individual limits).”
I hadn’t expected this; however, it merely meant that I needed to expand the job limit to some huge value. In retrospect, I should probably have created a separate jobq for our regular background jobs, like inetd or the apache webserver. That way I could create a small job to periodically check that queue, to ensure that all the required background jobs are “up,” and take appropriate action if a job has failed.
To my surprise, I found that after a “start norecovery” on this system, all my jobqs were missing. Again, the newjobq command help revealed what had happened:
“The job queues persist across reboots, provided a START RECOVERY is done. Any other system starts will cause the job queues to be deleted and they will have to be created again.This command is available in a session, job, or in BREAK. Pressing [Break] has no effect on this command. This command is not allowed in the SYSSTART file.”
We have a command called Startall, which is used to start all of the system jobs, so I put the newjobq commands in this command file to insure all of the jobq’s were built with the proper names and limits:
setvar hpmsgfence 2
This way I am assured that the jobq’s always exist when we restart the system.
Personally, I have found no unexplainable problems with the new jobq feature; however recent traffic on the 3000-L mailing list did showcase this query from David Knispel:
“We ran out of disk space over the weekend. Now my JOBQs are screwed up. When I do LISTJOBQ for HPSYSJQ, it shows 6 EXEC but only a limit of 3 and total of 3. When I do a SHOWJOB, only 3 jobs show for this queue. I’m having the same problem with other queues also. Any way to fix this without bouncing the system?”
To which Richard Bayly from HP responded:
“The patch you are after is: MPELXC2B - 6.5; MPELXC2D - 7.0; MPELXC2C - 6.0 but superseded by MPELXL7A.”
Are jobq’s for you?
In conclusion, I do find the new jobq feature quite valuable, as it gives me a tool to break up my jobq’s logically if not physically. I am able to manage some groups of jobs more easily and have more jobs working concurrently, getting more out of my HP e3000. If you have problems with concurrency, or having jobs run when they shouldn’t, then perhaps implementing jobq’s is the way to go.