Wayback Wed: Blog takes aim at 3000 news
3000 consulting returns not so costly

Friday Fine-Tune: Cleaning Up Correctly

Classic 3000 Advice
By John Burke

Good intentions about maintenance sometimes stumble in their implementation. As an example, here’s a request for help on cleaning up.

Cleanup-tools“We have a 989/650 system. Every weekend we identify about 70,000 files to delete off the system. I build a jobstream that basically executes a file that has about 70 thousand lines. Each line says ‘PURGE file.group.account’. This job has become a real hog. It launches at 6 AM on Sunday morning, but by 7 PM on Sunday night it has only purged about 20,000 files. While this job is running, logons take upwards of 30 seconds. What can I do?”

This reminds me of the old joke where the guy goes to the doctor and complains “Gee, doc, my arm hurts like hell when I move it like this. What can I do?” The doctor looks at him and says “Stop moving it like that.” But seriously, the user above is lucky the files are not all in the same group or he would be experiencing system failures like the poor user two years ago who was only trying to purge 40,000 files.

In either case, the advice is the same; purge the files in reverse alphabetic order. This will avoid a system failure if you already have too many files in a group or HFS directory, and it will dramatically improve system performance in all cases. However, several people on the 3000-L list have pointed out that if you find you need to purge 70,000 files per week, you should consider altering your procedures to use temporary files. Or if that will not work, purge the files as soon as you no longer need them rather than wait until it becomes a huge task.

If all the files are in one group and you want to purge only a subset of the files in the group, you have to purge the files in reverse alphabetical order to avoid the System Abort (probably SA2200). PURGEGROUP and PURGEACCT will be successful, but at the expense of having to recreate the accounting structure and restoring the files you want to keep. Note that if you log onto the group and then do PURGEGROUP you will not have to recreate the group.

Craig Fairchild, MPE/iX File System Architect explained what is going on. “Your system abort [or performance issues] stem from the fact that the system is trying desperately to make sure that all the changes to your directory are permanently recorded. To do this, MPE uses its Transaction Management (XM) facility on all directory operations.

“To make sure that the directories are not corrupted, XM takes a beginning image of the area of the directory being changed, and after the directory operation is complete, it takes an after image. In this way, should the system ever crash in the middle of a directory operation, XM can always recover the directory to a consistent state - either before or after the operation, but not in a corrupted in-between state.

“On MPE, directories are actually just special files with records for each other file or directory that is contained in them. They are stored in sorted alphabetical order, with the disk address of the file label for that file. Because we must keep this list of files in alphabetical order, if you add or delete a file, the remaining contents of the file need to be “shifted” to make room, or to compact the directory. So if you purge the first file alphabetically, XM must record the entire contents of the directory file as the before image, and the entire remaining file as the after image.

“So purging from the top of the directory causes us to log data equal to twice the size of the directory. Purging from the bottom of directory causes XM to log much less data, since most of the records stay in the same place and their contents don’t change. The system abort comes from the fact that more data is being logged to XM than it can reliably record. When its logs fill completely and it can no longer provide protection for the transactions that have been initiated, XM will crash the system to ensure data integrity.”

Goetz Neumann added, “PURGEGROUP (and PURGEACCT) do not cause a SA2200 risk, since they actually traverse the directory in reverse alphabetical order internally. This is useful to know for performance reasons. Since these commands cause much smaller XM transactions, it is faster to empty a group by logging into it and then PURGEGROUP it, instead of using PURGE @.

“There is a little-known tool to help prevent you from running into these situations in the first place: DIRLIMIT.MPEXL.TELESUP. A suggested (soft) limit for directory files would be 2MB. This would limit MPE to not have more than 50,000 files in one group, and (very much depending on the filenames) much less than 50,000 files per HFS directory. (These are XM protected just as well, and tens of thousands of files in an HFS directory is not a good idea from a performance standpoint, either.)

“Another way to reduce the risk of SA2200 in these situations would be to increase the size of the XM system log file (on the volume set that holds the group with the large number of files), which is available in a VOLUTIL command.

Comments