Thursday, December 9 1999
For much of the day I found myself suffering at the hands of a perfectly legitimate copy Office 2000 installed on my workplace workstation. In general I like the version of Word include in Office 2000, but there is one ludicrous set of features against which someone badly needs to speak out. I'm talking specifically about the "save as web page" option, Microsoft's response to the clamour for a means to export ubiquitous Word documents to the Web (that thing which was supposed to end up being a poor cousin to MSN, remember?).
I remember the "save as web page" feature from a version of Microsoft Word I used back in 1996. It wasn't all that great; the way it dealt with smart quotes was enough to make me cry, especially on a Macintosh. And for some reason it felt the need to freshly declare font tags at every new paragraph. But it was usable. You can still see relics of its quirky markup in the source of the Big Fun Glossary, which began its life as a Macintosh Microsoft Word 5.0 document.
Today, though, I found myself wading through the mind-bogglingly complex source output by Office 2000's HTML converter. I just wanted some simple markup for one of my Dad's papers, which included lots of italicized scientific names that I didn't want to have to redo manually. But Office 2000 saw fit to include all sorts of unnecessary additional markup, including such bizarre extras as the dozen or so tab positions set when the document was composed. These positions were declared with each and every paragraph, even if they were the same as for the last paragraph. Every italic and bold tag wasn't complete in and of itself; each included their own style declaration of 80 or so characters. Several font faces and styles were mentioned in nearly all the tags as well. And that wasn't all; each file began with an lengthy header (consisting partly of XML) that included such vaguely privacy-breaching information as the the name of the registered owner of the converting copy of Microsoft Word. All these extras added up to something like three times the size of the plain text version of the document, all in the name of... who knows?
The style declarations were made repetitively, making it impossible to easily use a text editor to change fonts, styles or sizes, yet the declarations were different enough from instance to instance to eliminate simple search and replace. The generated-document, even in HTML form, was locked in Microsoft's clutches, completely un-editable without Microsoft tools. I was infuriated. The situation was so absurd that it felt like a parody of stereotypical corporate zeal for absolute control, but there was absolutely nothing funny about it. My ability to make web pages from Microsoft Documents, the most common documents to be found, was severely hampered.
I went out on the web looking for other options, but none of them were satisfactory. So I ended up writing an ASP Script that strips out nearly all of the superfluous Microsoft crap from an HTML document. I works rather well and I anticipate I'll be using it a lot in the future.

Lately I've been drinking brandy, not vodkatea. Lately Kim and I have been waking up in the middle of the night and having sex while not fully conscious. Lately I've been getting along fairly well with my two new workstation neighbors at work. "Channel Developers" they're called, and they're about as skilled as I was at this time last year.

