Friday Fine-Tune: Going Beyond JBOD
March 10, 2017
By Gilles Schipper
One of the most cost-effective ways of advancing the reliability of your legacy system may be to replace your existing “JBOD” disk system with a much more reliable disk system. MOD20 units, still a better deal than individual disks, can provide a good starting point to implement RAID. JBOD is an acronym meaning “just a bunch of disks” — which would characterize the majority of HP 3000 systems as they were initially sold. JBOD disk systems comprise a set of independent — typically SCSI-connected — disks, which are each seen by the HP 3000 as a single logical device number or LDEV. Each disk LDEV is associated with a “volume set” and the failure of a single disk renders the “volume set” to which it belongs inoperable and un-accessible.
Traditionally, most 3000 systems have comprised a single volume set (specifically, the required SYSTEM volume set, with the brevity-challenged label “MPEXL_SYSTEM_VOLUME_SET”).
Systems comprising a large number of “JBOD” LDEVs increased the likelihood of system downtime, since the failure of a single, old disk effectively resulted in a “down” system — requiring a time-consuming disk replacement and system reload before the system could properly function once again.
To mitigate such delicate exposure to a single disk failure, many installations implemented the “User Volume Set” feature built in to MPE/iX, then constructed multiple volume sets so that the failure of a single disk affected only the volume set to which it belonged.
For practical purposes, the only real benefit to this approach was to reduce the amount of time required to replace the disk and reload only the data residing on the affected volume set. (In reality, it was usually quite unusual for a system to continue normal, or even minimal operation with even a single unavailable volume set).
This enabled the system administrator to configure non-system volume set disk drives to be associated with identical corresponding “mirror” disks. The software was responsible for dynamically duplicating the contents of both disk drive “mirrors” such the failure of one of the two mirror drives could be tolerated without affecting the continuous operation of the system. The damaged disk could then be replaced and the dynamic disk duplication would resume.
Only if both mirror pairs failed would there be a corresponding system outage and data loss. However, software mirroring was still far from ideal. Since it was unavailable for the MPEXL_SYSTEM_VOLUME_SET, the failure of a system disk, unprotected by mirroring software, would result in certain system down time.
Further, software mirroring exacted a price in terms of CPU and I/O overhead that could otherwise be utilized for actual “useful” processing.
And, as a wise person once said, given a choice, a feature is almost always better implemented in hardware than software. This certainly applies to disk mirroring and nicely aligns with the the Nike MOD20 RAID disk system, which is (one of the) HP 3000’s solutions to the compromises associated with software mirroring.
The MOD20 features dual controllers, duplicated (even triplicated) power supplies, and up to 20 disk drives housed in a single frame/enclosure that provides significant improvements over the MPE/iX software mirroring functionality.
Each MOD20 provides for a maximum of 8 logical units (LUN’s) to be configured — each of which appears as a single logical device no. (LDEV no.) to the HP 3000. A maximally and optimally configured MOD20 will include 20 disk drives and be configured as follows:
14 disks to be defined as type RAID1, using up 7 LUNS—since each LUN comprises two separate mirrored disks. RAID level 1 is equivalent to simple mirroring whereby one disk is dynamically maintained as a duplicated mirror image on its mirrored twin disk, which must of identical size and model.
If one disk of the mirrored pair fails, the other disk can take over the responsibility of presenting the requisite data IO to and from the host system with no perceived performance degradation. The remaining 6 disks can be configured as a single LUN comprising 4 RAID 1/0 disks and 2 hot spares.
A RAID 1/0 configuration takes an even number of disks and duplicates the contents of half of them (as a group) onto the other half.
The hot spares would act as dynamic replacements for any disk in the MOD20 that fails, such that even the failure of one or two disks would not prevent the entire disk subsystem from maintaining its fail-safe mirroring capability. Without the hot-spare feature, failure of a single disk would allow normal system activity to continue but without further fail-safe capability for the failing LUN only.
Chances of both disks in the same LUN failing are extremely remote. That is why I advise you to forgo the hot spare capability. Utilize a 6-disk RAID 1/0 LUN instead of a 4-disk RAID 1/0 LUN, giving you additional usable disk space overall.