Thursday Throwback: IMAGE vs. Relational
November 13, 2014
As a precocious 18-year-old, Eugene Volokh wrote deep technical papers for HP 3000 users who were two or three times his age. While we pointed to the distinctions between IMAGE master and automatic datasets recently, Eugene's dad Vladimir reminded us about a Eugene paper. It was published in the fall of 1986, a time when debate was raging over the genuine value of relational databases.
While the relational database is as certain in our current firmament as the position of any planet, the concept was pushing aside proven technology 28 years ago. IMAGE, created by Fred White and Jon Bale at HP, was not relational. Or was it? Eugene offered the paper below to explore what all the relative fuss was about. Vladimir pointed us to the page on the fine Adager website where the paper lives in its original formatting.
The relationships between master and automatic and detail datasets pointed the way to how IMAGE would remain viable even during the onslaught of relational databases. Soon enough, even Structured Query Language would enter the toolbox of IMAGE. But even in the year this paper emerged, while the 3000 still didn't have a PA-RISC model or MPE/XL to drive it, there was a correlation between relational DBs and IMAGE. Relational databases rely on indexes, "which is what most relational systems use in the same way that IMAGE uses automatic masters," Eugene wrote in his paper presented at COBO Hall in Detroit (above). QUERY/3000 was a relational query language, he added, albeit one less easy to use.
Vladimir admits that very few IT professionals are building IMAGE/SQL databases anymore. "But they do look at them, and they should know what they're looking at," he explained.
Relational Databases Vs. IMAGE:
What The Fuss Is All About
By Eugene Volokh, VESOFT
What are "relational databases" anyway? Are they more powerful than IMAGE? Less powerful? Faster? Slower? Slogans abound, but facts are hard to come by. It seems like HP will finally have its own relational system out for Spectrum (or whatever they call it these days). I hope that this paper will clear up some of the confusion that surrounds relational databases, and will point out the substantive advantages and disadvantages that relational databases have over network systems like IMAGE.
What is a relational database? Let's think for a while about a database design problem.
We want to build a parts requisition system. We have many possible suppliers, and many different parts. Each supplier can sell us several kinds of parts, and each part can be bought from one of several suppliers.
Easy, right? We just have a supplier master, a parts master, and a supplier/parts cross-reference detail:
Every supplier has a record in the SUPPLIERS master, every part has a record in the PARTS master, and each (supplier, part-supplied) pair has a record in the SUPPLIER-XREF dataset.
Now, why did we set things up this way? We could have, for instance, made the SUPPLIER-XREF dataset a master, with a key of SUPPLIERS#+PART#. Or, we could have made all three datasets stand-alone details, with no masters at all. The point is that the proof of a database is in the using. The design we showed -- two masters and a detail -- allows us to very efficiently do the following things:
- Look up supplier information by the unique supplier #.
- Look up parts information by the unique part #.
- For each part, look up all its suppliers (by using the cross-reference detail dataset).
- For each supplier, look up all the parts it sells (by using the cross-reference detail dataset).
This is what IMAGE is good at -- allowing quick retrieval from a master using the master's unique key and allowing quick retrieval from a detail chain using one of the detail's search items.
PART# <-- unique key item
DESCRIPTION
SHAPE
COLOR
...
What if we want to find all the suppliers that can sell us a "framastat"? A "framastat", you see, is not a part number -- it's a part description. We want to be able to look up parts not only by their part number, but also by their descriptions. The functions supported by our design are:
- Look up PART by PART#.
- Look up SUPPLIERS by SUPPLIERS#.
- Look up PARTs by SUPPLIERS#.
- Look up SUPPLIERs by PART#.
What we want is the ability to
- Look up PART by DESCRIPTION.
The sad thing is that the PARTS dataset is a master, and a master dataset supports lookup by ONLY ONE FIELD (the key). We can't make DESCRIPTION the key item, since we want PART# to be the key item; we can't make DESCRIPTION a search item, since PARTS isn't a detail. By making PARTS a master, we got fast lookup by PART# (on the order of 1 or 2 I/Os to do the DBGET), but we forfeited any power to look things up quickly by any other item.
And so, dispirited and dejected, we get drunk and go to bed. And, deep in the night, a dream comes. "Make it a detail!" the voice shouts. "Make it a detail, and then you can have as many paths as you want to."
We awaken elated! This is it! Make PARTS a detail dataset, and then have two search items, PART# and DESCRIPTION. Each search item can have an automatic master dataset hanging off of it, to wit:
What's more, if we ever, say, want to find all the parts of a certain color or shape, we can easily add a new search item to the PARTS dataset. Sure, it may be a bit slower (to get a part we need to first find it in PART#S and then follow the chain to PARTS, two IOs instead of one), and also the uniqueness of part numbers isn't enforced; still, the flexibility advantages are pretty nice.
So, now we can put any number of search items in PARTS. What about SUPPLIERS? What if we want to find a supplier by his name, or city, or any other field? Again, if we use master datasets, we're locked into having only one key item per dataset. Just like we restructured PARTS, we can restructure SUPPLIES, and come up with:
Note what we have done in our quest for flexibility. All the real data has been put into detail datasets; every data item which we're likely to retrieve on has an automatic master attached to it.
Believe it or not, this is a relational database.
If this is a relational database, I'm a Hottentot
Surely, you say, there is more to a relational database than just an IMAGE database without any master datasets. Isn't there? Of course, there is. But all the wonderful things you've been hearing about relational databases may have more to do with the features of a specific system that happens to be relational than with the virtues of relational as a whole.
Consider for a moment network databases. IMAGE is one example, in fact an example of a rather restricted kind of network database (having only two levels, master and detail). Let's look at some of the major features of IMAGE:
- IMAGE supports unique-key MASTERS and non-unique-key DETAILS.
- IMAGE does HASHING on master dataset records.
- IMAGE has QUERY, an interactive query language.
Which of these features are actually network database features? In other words, which features would be present in any network database, and which are specific to the IMAGE implementation? Of the three listed above, only the first -- masters and details -- must actually be present in all databases that want to call themselves "network." On the other hand, a network database might very well use B-trees or ISAM as its access method instead of hashing; or, it might not provide an interactive query language. It would still be a network database -- it just wouldn't be IMAGE.
Why is all this relevant? Well, let's say that somebody said "Network databases are bad because they use hashing instead of B-trees." This statement is wrong because the network database model is silent on the question of B-trees vs. hashing. It is incorrect to generalize from the fact that IMAGE happens to use hashing to the theory that all network databases use hashing. If we get into the habit of making such generalizations, we are liable to get very inaccurate ideas about network databases in general or other network implementations in particular.
The same goes for relational databases. The reason that so many people are so keen on relational databases isn't because they have any particularly novel form of data representation (actually, it's much like a bunch of old-fashioned KSAM/ISAM-like files with the possibility of multiple keys); nor is it because of some fancy new access methods (hashing, B-trees, and ISAM are all that relational databases support). Rather, it's because the designers of many of the modern relational databases did a good job in providing people with lots of useful features (ones that might have been just as handy in network databases).
What are relational databases: functionality
The major reason for many of the differences between relational databases and network databases is simple: age. Remember the good old days when people hacked FORTRAN code, spending days or weeks on optimizing out an instruction or two, or saving 1000 bytes of memory (they had only 8K back then) ? Well, those are the days in which many of today's network databases were first designed; maximum effort was placed on making slow hardware run as fast as possible and getting the most out of every byte of disk.
Relational databases, children of the late '70s and early '80s had the benefit of perspective. Their designers saw that much desirable functionality and flexibility was missing in the older systems, and they were willing to include it in relational databases even if it meant some wasted storage and performance slow-down. The bad part of this is that, to some extent, modern relational databases are still hurting from slightly decreased performance; however, this seems to be at most a temporary problem, and the functionality and flexibility advantages are quite great.
For even more IMAGE education, like the advantages of IMAGE over relational databases, and a tour of the flexibility that automatic masters provide, see the remainder of the paper on the Adager website.