Use SED to automate text processing
December 6, 2005
Up on the 3000-L mailing list, a lively tutorial broke out yesterday on using SED, a stream editor built in the open source community. Since 2001 SED has worked on the HP 3000, thanks to Lars Appel, a former HP support engineer who ported Samba to the platform in the 1990s.
SED's main MPE page is on Appel's page at www.editcorp.com/Personal/Lars_Appel/sed/. (Editcorp is a company that consults in the HP 3000 community, among other places. It also works with relaying the 3000-L postings to the comp.sys.hp.mpe newsgroup.) It's an at your own risk download, but support is available through the 3000 community. Yesterday's 13-message volley proved that; the community heard from Appel on one of SED's blind spots, along with a workaround.
Dan Barnes, working on a problem he had to solve in his MM 3000 environment, asked:
The issue is incoming data from another platform that is being fed into MM 3000. This data occasionally has some unprintable characters, which of course wrecks havoc on the MM application when it is encountered. To address this, the user, using a cygwin (Unix-like) environment on their Windows PC, developed a SED script. When they test the script in the cgywin environment it works just fine. But when done on the target HP3000 (7.0 pp2) it gets an undesirable result.
Barnes added that "The user thought that because MPE/iX is Posix-compliant, that this should work." He explained his user created the expression
sed -e 's/[\x7F-\xFE]/*/g' < COMSHD > COMSHD1
But Appel noted that hex 7F thru hex FE portion of the expression isn't supported on the MPE/iX version of SED. It's a limitation of MPE/iX, but there's a workaround:
Not sure if the regular expression usage here matches Posix or GNU specs, but my guess is the "\xNN" format, that seems to indicate a char by hex code, doesn't work.
How about something like sed -e 's/[^ -~]/*/g' instead, i.e. map the chars outside the range space through tilde?
Appel then noted a nifty Web reference that documents the man pages (manual pages) for the 3000's Posix features:
http://invent3k.external.hp.com/~MGR.LROM3K/man3k/
Contained within this resource is the documentation on how the 3000 handles regular expressions — the rules, if you will, on how to form compliant SED expressions.
Oh, and the solution took only four hours to deliver to Barnes. Not bad for an "unsupported" part of the 3000 experience.