Consider low-cost RAIDing for reliability
July 2, 2007
By Gilles Schipper
Homesteading Editor, 3000 NewsWire
In my last column, I listed the cost-effective options available to the HP 3000 homesteader to enhance the performance and reliability of the their aging 3000 server. Various opportunities related to backup, disaster planning, performance optimization, security and reliability were briefly described.
One of the most cost-effective ways of advancing the reliability of your legacy system may be to replace your existing “JBOD” disk system with a much more reliable disk system, commonly referred to as RAID (Redundant Array of Inexpensive Disks). MOD20 units, now less expensive than ever, can provide a good starting point to implement RAID.
In contrast, JBOD is an acronym meaning “just a bunch of disks” — which would characterize the majority of HP 3000 systems as they were initially sold. JBOD disk systems comprise a set of independent — typically SCSI-connected — disks, which are each seen by the HP 3000 as a single logical device number or LDEV.
Each disk LDEV is associated with a “volume set” and the failure of a single disk renders the “volume set” to which it belongs inoperable and unaccessable.
Traditionally, most 3000 systems comprised a single volume set (specifically, the required SYSTEM volume set, with the brevity-challenged label “MPEXL_SYSTEM_VOLUME_SET”).
Systems comprising a large number of “JBOD” LDEVs increased the likelihood of system down time, since the failure of a single (old) disk effectively resulted in a “down” system — requiring a time-consuming disk replacement and system reload before the system could properly function once again.
To mitigate such delicate exposure to a single disk failure, many installations implemented the “User Volume Set” feature built in to MPE/iX, then constructed multiple volume sets so that the failure of a single disk affected only the volume set to which it belonged.
For practical purposes, the only real benefit to this approach was to reduce the amount of time required to replace the disk and reload only the data residing on the affected volume set. (In reality, it was usually quite unusual for a system to continue normal, or even mimimal operation with even a single unavaliable volume set).
To further improve system reliability and minimize down time an optional, additional-cost software product was available in the form of software mirroring — aka “MPE/iX mirroring.”
This enabled the system administrator to configure non-system volume set disk drives to be associated with identical corresponding “mirror” disks. The software was responsible for dynamically duplicating the contents of both disk drive “mirrors” such the failure of one of the two mirror drives could be tolerated without affecting the continous operation of the system. The damaged disk could then be replaced and the dynamic disk duplication would resume.
Only if both mirror pairs failed would there be a corresponding system outage and data loss.
However, software mirroring was still far from ideal.
Since it was unavailable for the “MPEXL_SYSTEM_VOLUME_SET” failure of a system disk, unprotected by mirroring software, would result in certain system down time.
Further, software mirroring exacted a price in terms of CPU and I/O overhead that could otherwise be utilized for actual “useful” processing.
And, as a wise person once said, given a choice, a feature is almost always better implemented in hardware than software. This certainly applies to disk mirroring and nicely seques with the the Nike MOD20 RAID disk system, which is (one of the) HP 3000’s solutions to the compromises associated with software mirroring.
The MOD20 features dual controllers, duplicated (even triplicated) power supplies, and up to 20 disk drives housed in a single frame/enclosure that provides significant improvements over the MPE/iX software mirroring functionality.
Each MOD20 provides for a maximum of 8 logical units (LUN’s) to be configured — each of which appears as a single logical device no. (LDEV. no.) to the HP 3000. A maximally and optimally configured MOD20 will include 20 disk drives and be configured as follows:
14 disks to be defined as type RAID1, using up 7 LUNS, since each LUN comprises two separate mirrored disks.
RAID level 1 (or RAID1) is equivalent to simple mirroring whereby one disk is dynamically maintained as a duplicated mirror image on its mirrored twin disk, which must of identical size and model.
If one disk of the mirrored pair fails, the other disk can take over the responsibility of presenting the requisite data I/O to and from the host system with no preceived performance degradation. The remaining 6 disks can be cofigured as a single LUN comprising 4 RAID 1/0 disks and 2 hot spares.
A RAID 1/0 configuration takes an even number of disks and duplicates the contents of half of them (as a group) onto the other half.
The hot spares would act as dynamic replacements for any disk in the MOD20 that fails, such that even the failure of one or two disks would not prevent the entire disk subsystem from maintaining its fail-safe mirroring capability.
Without the hot-spare feature, failure of a single disk would allow normal system activity to continue but without further fail-safe capability for the failing LUN only.
Chances of both disks in the same LUN failing are extremely remote and that is why I advise some to forgo the hot spare capability and utilize a 6-disk RAID 1/0 LUN instead of a 4-disk RAID 1/0 LUN giving additional useable disk space overall.
Tommorrow I'll talk about a few other things to consider if you will be acquiring a MOD20.