IMAGE's Large File DataSets: The Problem and How to Fix It
February 14, 2006
[Editor’s note: HP has been working on repairing IMAGE/SQL’s Large Files DataSets (LFDS) since 2004. Using LFDS, the default in MPE/iX 7.5, can result in corrupt HP 3000 databases.]
By Stan Sieler
The basic LFDS problem: The engineers who implemented LFDS on IMAGE/SQL were unaware that all native mode languages that supported 64-bit pointers only supported them in a “single 4 GB space” mode. Please note that nearly every engineer in HP and outside of HP was probably also unaware of this compiler limitation.
Prior to the introduction of Large Files in MPE/iX 6.5, the biggest virtual address range you could get was 4 GB, by doing a “long mapped” open of a 4 GB file. The resulting address was a 32-bit Space ID, and a 32-bit offset. No compiler needed to worry about the Space ID changing while manipulating a pointer, because no address range could ever start in one space and continue to the next higher space (i.e., the Space ID would never change when doing address arithmetic).
When Large Files were added to MPE/iX, that changed.
A 9 GB Large File is assigned three consecutive Space IDs... so you have 9 GB of consecutive virtual addresses associated with the 9 GB of the file.
But that introduced difficulties. Imagine if you have an 8-byte record in a file, and the virtual address of the start of that record is $ff00.$fffffffc. i.e., it starts 4 bytes before the end of a 4 GB chunk (in our example, Space ID is $ff00, and the offset is $fffffffc). The first four bytes are at $ff00.$fffffffc, $ff00.$fffffffd, $ff00.$fffffffe, and $ff00.$ffffffff. The next four bytes are at $ff01.$00000000, $ff01.$00000001, $ff01.$00000002, and $ff01.$00000003 — so the record crosses a 4 GB boundary. (The “$” indicates a hexadecimal value; the format of a 64-bit address is a 32-bit Space ID (e.g., $ff00), and a 32-bit offset (e.g., $0 or $ffffffff.)).
Unfortunately, in all of our compilers, ordinary attempts to access those 8 bytes will result in getting the first four bytes correctly, and then the addressing will “wrap” and the wrong second four bytes are fetched: $ff00.$00000000, $ff00.$00000001, $ff00.$00000002, and $ff00.$00000003. Note the incorrect Space ID!
Some people might ask, wouldn’t the “record split over a 4 GB boundary” problem be familiar, and have been encountered during the implementation of Large Files for MPE/iX 6.5?
The most common problem, and the easiest to fix, was spotting code like:
move_bytes (num_bytes, source, destination);
The “move_bytes” would fail, because it did not understand the 4 GB boundary problem. The remediation for such calls was simple:
move_fast_64 (num_bytes, source, destination);
where move_fast_64 was a new MPE/iX routine that properly handled 64-bit addresses.
When I reported the LFDS data corruption problem, I suggested a solution along the above lines. Far less common within the operating system was code that accessed items within records that crossed 4 GB boundaries. Code like:
type
buffer_type = array [0..127] of integer;
var ptr : ^ $extnaddr$ buffer_type;
i, j : integer;
...code to set ptr to point into a Large File
i := ptr^ [0];
j := ptr^ [1];
If “ptr” is pointing into a Large File, then the Pascal compiler’s address calculations for dereferencing “ptr^” would be incorrect ... the code would work almost all of the time, but when the first byte of a range pointed to one Space ID, and the last byte points into the following Space ID (as in the $ff01.$fffffffc example above), the wrong data would be accessed.
The remediation in that case would be something like:
var
temp_buf : buffer_type;
...code to set ptr to point into a Large File
move_fast_64 (sizeof (buffer_type), ptr, temp_buf);
i := temp_buf [0];
j := temp_buf [1];
Luckily, the majority of pointer use in any PA-RISC program is of 32-bit pointers (which don’t have this problem), not 64-bit pointers.
Still, it’s tedious to track down all uses of 64-bit pointers and then determine if they need remediation (most didn’t need work).
The first attempt at fixing the LFDS corruption problem presumably involved changing most “move_byte” calls to “move_fast_64” calls. Unfortunately, I suspect that IMAGE had within-a-record pointer access in some places that were used less often (and not caught until mid-2005 testing of the first patch, reported by the 3000 Newswire) and that those uses proved to difficult to patch — resulting in a different approach to fixing the problem in the next patch.
Around mid-2005, I suggested a simpler fix: change the “allocate a new entry” code in IMAGE to simply not allocate any entry that crosses a 4 GB boundary. It would have zero effect on tools like Suprtool (i.e., code that uses the file system to read IMAGE datasets), and marginal effect on database tool vendors (or anyone using the file system to write into a dataset). I received no response from HP — perhaps they realized, as I did recently, that my proposal would limit the “hashing area” of LFDS master datasets to 4 GB.
Status, and Future
The currently shipping LFDS has a data corruption problem, and should not be used.
HP is working on a patch, one which approaches the problem quite
differently. At present, it looks like the patch would require new
releases of most database tools, both those that modify databases and
those that extract (and/or update) information. HP is discussing this
with a number of vendors.
Should you switch to LFDS in the future? Probably not.
Oddly enough, Jumbo datasets still provide for a larger dataset than a Large File would. Jumbo’s current limitation is 400 GB, compared to a maximum file size of 128 GB for a Large File. A one line change to IMAGE, transparent to users and vendors, would increase the Jumbo dataset limit to 4 TB (a five line change would increase the limit to more than 4 PB (peta-bytes)). Of course, those limits ignore other IMAGE limits that further limit dataset sizes (e.g., a maximum of 16,777,215 blocks, and a maximum block size 5,120 bytes, resulting in a limit of 80 GB).
A second advantage of Jumbo datasets is enhanced exportability. Some foreign operating systems do not allow you to create files larger than 4 GB. That means that you might not be able to transfer (e.g., FTP), an LFDS to a particular Unix site. With jumbo datasets, you would be transferring a series of files of up to 4 GB, which is much easier (and has better error recovery capability).
The sole real advantage of a properly working LFDS, compared to Jumbo datasets, would be the ability to enable DDX or MDX for one — an advantage rendered less interesting in light of existing database tools.
Stan Sieler, who led the design of major enhancements to IMAGE/3000 while working at HP, is executive vice president of Allegro Consultants, Inc., an HP User Group Hall of Fame member, developer of dozens of HP 3000 commercial and freeware products, and a co-author of “Beyond RISC! An Essential Guide to HP Precision Architecture.”