Use of non-plaintext in USENET news

There is of course some debate about the use of non-plaintext in USENET news. Some feel the 80 column monospace format commonly used is perfectly adequate for the job. Others feel that there is structure in USENET articles -- paragraphs, links, headers, signatures, included text and so on -- which are useful and should have been in there in the first place, if only to allow displays that are other than 80 columns wide.

Beyond the debate over whether it should be done, there is also debate over how it should be done, both with existing tools and in the future using tools designed with such features in mind.

The MIME group that designed how to do non-plaintext mail messages designed a format for messages that want to contain both plaintext for older tools and rich text for tools capable of handling that. Their solution, called the multipart/alternative, puts both parts in the message, and includes headers so that MIME-aware tools will display only one, the one they think is best for their capabilities and tools.

You can read MIME documents for more details on that. They required the plainest version come first, so even tools with no understanding of MIME would present something people could read.

This is their method for migrating from one format to another, when you have older tools that can't handle a new format. However, it's very inefficient, storing two copies of the message. And for users of the very oldest tools, it puts ugly stuff at the end of the message.

MIME has now been around almost a decade, and so most newsreaders at least know the basic parsing and can handle it. Only the oldest and unmaintained readers have trouble (oddly enough we have the alt.binaries.erotica groups to thank for that.)

MIME also defined a simple rich text format, but for better or worse, HTML has won as the rich text interchange format of the internet. (HTML actually started simple and pure, then got corrupted with lots of markup stuff, and the newer versions are actually reversing that trend, so this is not all bad news.)

Out of Band HTML

A better system was designed by me several years ago. I called it "Out of Band HTML" (or SGML). The idea for this system is to provide a plain text article, but in another stream (in the header) provide encodings on how to turn it into HTML or another marked-up form. Ie. out of band instructions say things like, "Down 4 lines, insert paragraph break, down 5 more lines, insert break, over 20 characters, start bold, over 10, stop bold" and so on.

The value of this design is that old tools, and users who want plain text see plain text. Tools that can understand the encoding can show more. And it's very compact, adding just 2-3% to the size of articles.

Alas it has yet to be adopted.

Another system I designed, ProleText, encoded markup in truly invisible trailing whitespace. It is actually still deployed by ClariNet, and works pretty well, but is limited in the markup it can encode.

Prettyprinted HTML

Another idea is to do articles in HTML, but make sure they are also readable as plain text. That means "hiding" the tags, for example by putting them in the far right margin.

It mostly works, but readers with only partial understanding of MIME don't handle it well. They may figure that as HTML, they will invoke Netscape on the document, when it's not really needed. And the little blips in the margin are not totally invisible.

Fear of HTML

Of course many people fear HTML because it is so often used badly on the web. Pointless layout which bloats pages, blinking text and graphics for the sake of looking pretty. All these concerns are valid, but they should not blind us to the truly useful things also in the language. There is structure in USENET postings -- at a minimum (and most importantly) paragraphs.

Images

If you want even more controversy, bring up the issue of images in USENET postings. Many feel they should be segregated into their own special newsgroups, and they commonly are. It is, however, more complex than that.

Newsgroups are actually meant to be formed around areas of interest, and not around the medium of expression. Some feel that rec.humor really means "rec.humor.text_only", but the truth is they mean that for technological reasons. People wanting to read humour are not against cartoons, and unless there is sufficient volume for a cartoon-only group, it is pointless to post them to a generic binaries group not oriented around humour.

Still this issue will remain one of debate. People tend to create oversized images when they post to the net (or the web) which is naturally a problem for users with slow links. The truth is just about any image useful on USENET can be fit in 40 to 60KB for a jpeg, or less for a line-art GIF. In other words the size of around 5 to 30 typical text articles. Large indeed, but if done in moderation, not an overwhelming increase in the volume -- even if posted inline. (More on inline images later.)

There are also some users who download entire groups for offline reading, who are concerned about inline images wasting their resources. Users who pick articles one at a time can of course easily avoid, or killfile, images as long as they are marked. They are no different from the thousands of other articles, large and small, one skips over in a day's reading.

MIME-inline vs. links

As noted images can be provided inline. That means in the posting, MIME encoded. Most users have a newsreader that will display such images (thanks again to the porn groups). They often also display the obsolete uuencode format.

This turns out to be the most bothersome (because it's large) format to those who don't want the image and the most useful one to those who do. In spite of some preconceptions, when it comes to just what USENET is good at, bits are bits. If people want to read or see something, and USENET delivers it to their local server in advance, that's good no matter what size it is. If they don't want to see it -- text or image -- it's a waste.

There is even an argument that since text is not slowed down much by remote access compared to images, which win big from local (especially LAN) access, that it is even more appropriate for images to go via USENET than text.

Using links (either for automatic inclusion or to click on) of course eliminates the size issue. They are simply unusable for those without a browser and a live connection. Presumably if they are marked, such users will not articles of this type. You get the downsides of the web, of course. Access is slower, the web server can get swamped, and if it's down (or your connection is down) at reading time, you don't see the image.

It's a conundrum -- including the image in the posting allows offline users to see the image, but offline users who don't want the image get the most upset at the downloaded bytes.

However, my rec.humor.funny experiment shows that the volume of viewing is not likely to swamp a decently provisioned server

Links in plain text

Most, though not all, of the newsreaders that are able to browse the web or interact with a browser will notice a URL in a plain text article and turn it into a clickable link. Even users of X windows and a plain text newsreader can sweep out a URL, and then center-click on top of Netscape to visit the URL. This means that plain text with URLs is another viable alternative, though it has the ugliness of being a kludge.

Conclusion

Leaving aside the issue of whether images should appear, it seems at present time the best way to do them is either with a plain text article with a URL on a line by itself, or a small multipart/alternative. The mutlipart/alternative is larger, but almost all users can handle it today, and it's much cleaner for the majority of users with an HTML-capable newsreader. Plus it also allows automatic inclusion of the graphic.

The graphic should be small -- typically no more than 20 to 60KB (even the latter only takes under 20 seconds for a modem user, happens in the background and can be easily interrupted) and served from a reliable, high capacity web server.

As software evolves these choices will change, and I still prefer an out of band system.