Brad Templeton Home


Brad Ideas
(My Blog)


ClariNet

Interviews

EFF

Jokes / RHF

Photo Pages

Panoramic Photos

SF Publishing

Software

Articles & Essays

Robocars

Spam

DNS

Jesus
The Book

Dot!

Packages

Interests


RHF Home

Ty Templeton Home

Stig's Inferno

Copyright Myths

Emily Postnews

Africa

World Heritage Sites

Burning Man

Phone Booth

Alice Pascal

The Rules for Guys

Bill Gates

Contact Me

If I were designing USENET from scratch

If I were designing USENET from scratch

USENET is now (2011) over 30 years old, and the B news format was mostly codified over 25 years ago, with a few changes since then -- References lines, Supersedes and MIME mostly, all in the 1980s. (This article was primarily written in 2004.)

Some wonder if USENET is irrelevant anymore in the age of the web and the always connected internet. They are partly right, but there is still much value in USENET. The core utilities it retains are:

  • Articles you read are fed to your local machine before you read them. That means when you do read them it is done over your local link or LAN, at blinding speed the web can't come close to duplicating. That is not just a quantitative difference in response time, it enables better access just as a PC does things over a terminal. It's also better for large files you don't have the bandwidth to stream. Send them in advance and you can play them instantaneously -- or seek arbitrarily within them.
  • It is distributed. This makes it harder for an outsider to take control of it (but also harder for it to undergo change, evolution or innovation.) The distributed approach is also vastly more efficient and robust. The idea of USENET "grinding to a halt" from heavy use (as has happened to popular web sites during big events) is effectively unheard of. This also enhances privacy -- you can read without anybody having any indication of what you read.
  • It still can be used offline, for those parts of the world where dedicated broadband is not yet an option, however this is no longer easy because it is full of links.

USENET also has some non-inherent characteristics that people still value:

  • Because it has remained plain text in most newsgroups, this has encouraged a focus on substance (such as it is) over the flash of the design-centric HTML world of the web. For threaded discussion, plain text is mostly what you want.
  • For reasons I can't fathom, USENET newsreaders remain, a decade after the dawn of web message boards, vastly more powerful. Most message boards don't even seem to deal with elementary concepts like knowing what you have read and not read, or your subscription profile. They are also better at thread handling and presentation, filtering and many other features, though they suck at some other elements of UI.
  • USENET supports crossposting, putting a message in more than one group in a way it will not be seen more than once by people who read both groups. Highly useful, but also perhaps the most abused feature of the net.

Finally, let me quickly list the things that are no longer special about USENET.

  • There are lots of other online communities and message boards, with web boards and mailing lists dominant. They are accessible to all.
  • Some of these do implement the powerful features of USENET like threading, subscriptions, and readership tracking. Some even do more.
  • Mailing lists are now much faster and more efficient than before, and for the many who use combined newsreader/mailer programs, they look about the same. Most mailiing lists now have web interfaces, with web searching, browsing and subscription control. Through Topica and Yahoo Groups, people have centralized control of subscription to many lists.
  • USENET was the center of community for the net in the past. It lost that status, and in fact most users of the internet don't even know what it is.

There are also areas where its design now has clear failings:

  • The net is so large that no one place can be the center of its community. The idea of having "the" newsgroup on a topic fit a smaller world better.
  • Its hard-to-control nature has made it a magnet for porn and copyright infringements; they now make up the vast majority of the net traffic.
  • Pre-feeding articles no longer makes sense in a lot of groups and articles, both because the average article may not get any readers on a small to medium site, and because internet latency times and bandwidth are more than adequate for remote fetching. Indeed, the pre-fetching may be a bandwidth waste.
  • It's become more and more rare for sites to host their own news servers. It's far more common for users to rely on centralized servers, often thousands of miles away, canceling out the merits of the old system.
  • The lack of ownership and control makes it harder for USENET to deal with spam other than via vigilante appraoches. These work but have serious issues.
  • It's not a plain text world any more. While a reliance on plain text does focus the mind, rich text is useful if done properly. Even twitter has links.

Redesigning it

If I were designing a distributed, efficient, pre-delivered message system today, I wouldn't do it the way we did 30 years ago. We know a lot more and could do it right.

Authenticate

Today we know how to do digital signature. So we should sign articles and control messages to help deal with spam and abuse. That doesn't mean we should force everybody to reveal their real name, but it does mean we should have some basic security on administrative procedures, moderated groups and letting people cancel their own articles. Different groups would have different policies on identity.

The authentication would come with a certificate system, the scope of which is to grand to be covered here. This system would allow, as a degenerate case, the administrative structure we use today, such as it is, but would allow arbitrary delegation of abilities and permissions as people like it, as finely grained as they like it.

Note that authentication doesn't mean the end of anonymity or pseudonums, it just gives ability to tune where they are used. People want posting to alt.sex to be anonymous and should have that, other groups may not. Many levels of identity and authentication are possible.

Abandon injectors and multicast it

Once you have a signing system, you can abandon the "USENET site as fiefdom" approach we now use, where you can only insert an article through the official injector of the site by which you get access. Instead, you can insert a signed article anywhere -- authentication of what you are doing is done through the digital signature, not by your IP address. There is no reason individual users could not post their own articles directly.

Indeed, if they have IP multicast access, they could multicast their articles, so that every site on the net that wishes to receive them could get them simultaneously, with super-high efficiency. Those who can't multicast could find any insertion point on the net willing to gateway them to a multicast sender.

The old "flood fill" algorithm that involved sites sending articles to all their neighbours, who then send to all their neighbours etc. would still be used, but it would be the backup, for those who can't get multicast or for articles dropped by multicast.

The concept of an "injector" would become optional, and even the remaining injectors would be simple tools that accept an article by TCP and send it out the feed stream by multicast and other methods.

It is worth noting that the current permission and injector structure of USENET can be implemented with the above tools as a simple case, but many more options are also possible. One could have blessed injectors, as we do today, and only allow postings through them by users trusted by the injectors based on IP address or login, just as today. Though why we would want such a rigid system, I don't know. I suspect most sites are tired of being responsible for their users. With a switch to certifiers being the path to those they certify, the sites can become the certifiers if they wish, but it is no longer a requirement. Such decentralization is what USENET's about.

Make it reliable

Part One -- no dropped articles

USENET is in some ways a robust design but in fact it regularly discards articles and nobody knows it. Systems should regularly gather digests of all the known messages being seen on the net (with newsgroup and distribution information) and post those. Then sites could scan, see any articles they missed (that they wanted) and fetch them from known sources.

With redunant postings of these lists, every site would have all the articles it wants. No missing articles, nothing lost.

Part Two -- report errors

USENET systems, when they encounter problems with articles, don't report the errors back to the poster, or often to anybody at all. There is a way to do this without flooding people with errors, as it turns out. Most USENET systems just drop articles with a problem on the floor and discard them. That's horrible software engineering. No error should occur without being reported to the person who cares about it.

Make it dynamic

USENET feeds articles in advance, so access to them is LAN fast. But that's no good if you feed a lot of articles in advance to places they will never be read. You want to minimize that, though still expect to pay some cost for the great benefit of instant response.

Reading systems should always know what is being read by their local users, and make sure they take in advance the most popular stuff -- as much as they can. For the rest, they should fetch all the headers in advance if there is any chance the messages will be read, and fetch the bodies on demand when the first user requests them. As users subscribe and unsubscribe, as well as change the fraction of articles in a newsgroup they read and don't read, this should change on an hourly basis.

This way each site gets in advance all the stuff people care about, and still has slightly delayed access (for the first reader) to all the material on the whole net if desired.

Make it binary

With so much of the volume of USENET going to binaries, for better or for worse, it should support articles that are binary. Forget the kludge of uuencode and MIME base64.

In addition, support for full international character sets is a must, to give access to all the languages of the world.

Make it huge

Today large files are posted to USENET broken up into thousands of articles, often posted with PAR files for error correction due to the number of articles that will be lost. Large files should be sent as a single unit (in binary). Each individual site can decide if it wants to accept or handle large articles by setting its own article size limit. If a transport mechanism can't handle a large file, it can worry about breaking it up and reassembling it; this is not something the posting software should be doing to get around ancient limits.

A large article should arrive all or nothing, and the NNTP mechanism should provide the size of an article before it is sent so articles larger than the desired size can be not transmitted at all.

(Note that just because a site does not want to receive large articles, they can still be made availalbe for fetch if another site is willing to serve them.)

Embrace the Web

USENET people have felt of the web and HTML as their enemy. There are good reasons for this. Many web sites emphasize form over substance, and serve a different purpose. But the virtues of HTML have been ignored due to this fear.

HTML should be embraced in the right way. Article structure should be remembered, such as where the paragraph breaks are. Ditto for the blocks of included text and signatures. The old ideals of the structure-only HTML. Hypertext links are already very common in USENET and should be supported.

Every newsgroup should have an associated web site, with a home page, FAQ page, search links, and help page for users. Forget weekly posting of the FAQ, it should be a single menu click in the newsreader.

Some newsgroups may even choose to fully embrace HTML with all its bad and good, its tables and graphics and the rest. There should be a way for newsgroups which are created to choose just what level of this they want, sticking with a focus just on the basics like paragraphs, headings and included text, or going for the whole ball of wax if they want it.

Likewise, get the web to embrace USENET. Make the "news:" URL work reliably again, through the use of archive servers and a more reliable convergence of the two. Make that news: URL fetch locally if the user has local access, and remotely if they don't, possibly from a chain of sites to try, but make it always work.

Indeed, promote the idea that for a message with a message-ID of string@domain, it can always be fetched as a last resort from http://newsart.domain/usenet/string or similar, and encourage all posting servers that generate message-ids. In addition, consider also trying a URL like http://newsgroup.usenet-archives.org/messageid where people can volunteer to maintain the permanent archive of any given newsgroup somewhere. Alternately, one could imagine a cluster of machines (with round robin) who take a URL with the message-id and some flags in it, and do a web redirect to a server which will serve that article. (Ie. these redirect servers figure out who is likely to be hosting the message and point you to it, they don't host them on their own. A browser or reader might turn news: URLs into such URLs.)

Note that newsreaders don't need to change to use this. The local NNTP server they read from can just do the fetch if the user requests a message-id it no longer has a copy of. As such it will be possible to follow any thread tree back years in the past even in an old newsreader.

Embrace other protocols

There should be easy tools to convert USENET feeds to RSS styles, inefficient as they are. Sites like google groups should be welcomed and made use of.

Totally replace newsgroup names

For get the silly hierarchy with the names full of dots. Why should a group exist in only one place in some committee designed hierarachy argued over in meaningless votes. Give newsgroups meaningful names. With spaces, even, and non-ASCII characters. Move the grouping of newsgroups for searching, browsing, feeding an policy be done by other mechanisms to collect them. Let a newsgroup appear in as many hierarchies as make sense.

Let people own newsgroups

When nobody owns something, the tragedy of the commons is we collectively let it go to pot. In addition, nobody puts effort into innovation and experimentation.

Web sites flourish because one person owns the web site, and gets to try what they want there. The web sites compete, and the good ideas thrive and the bad ones fail. People even make money.

Let people own newsgroups, and let them compete. If you don't like what they do, start your own or try somebody else's. That's what we do with web sites, web message boards, newspapers, music and most of our media. Forget the "Highlander" idea that there can be only one. Who says? With the dynamic feeding scheme that only feeds groups if people are reading it, there is no waste in this.

Likewise, make creation of a newsgroup be easy. Forget the bureaucracy, bickering and voting. You don't need somebody's permission to set up a web site. You just do it, and within a week it's in Google and people looking for your topic can find it.

Group owners need not be despots, in fact if they are they may not get much participation. Most should prefer to be custodians, like the operators of web boards and BBSs and mailing lists, not pre-approving postings or even stopping flamewars. Let a full spectrum of different policies be tried, including fully uncontrolled groups and tightly-edited ones.

Not all groups would have these custodians, but many might benefit.

Allow limits to crossposting

Relegate crossposting primarily to announcements, and only crosspost the first message, not the entire thread. In other words, require any followups to a crossposted message to go to one specific newsgroup, rather than having the audience of one newsgroup suddenly appear in another.

Allow the creation of topics.

Almost every BBS, online service and many web boards for over 15 years have supported the idea of "topics" -- a categorization halfway between a newsgroup (which covers a subject area and the people interested in it) and threads, which are individual discussions with associated postings.

Let topics be created either by users, or semi-trusted users or the group custodians, as people wish. But let users easily see the list of topics, choose them for postings, and filter according to them.

You need support for these topics, ways to distribute the lists of them, descriptions of them, help for them, etc.

Allow easy subsetting of the net

To encourage innovation and experimentation, make it easy, using a header like the old "Distribution" header, to create subsets of the net where experiments can take place but also quickly get a large audience. This is another area the web did well, since anybody who wanted to extend it could write a plugin, and any user in the world could download the plugin to try a whole new feature. The plugin concept should be extended to newsreaders.

Compress

While not a vital need, it makes sense to compress articles when they are generated, leaving only a few special headers uncompressed, and to save decompression for the newsreader. For the text component of the net, it would cut traffic volume to less than half. A specialized compressor that understands USENET's headers would do even better, possibly cutting the flow of non-binaries to 1/3rd.

Putting it all together

Here's how the process might go for a user of the new USENET above, using. a newsgroup called "Science Fiction Books" rather than "rec.arts.sf.written"

Alice, our user, found this newsgroup using a text search (search engine style) rather than browsing a hierarchy. She joined it, and the moment she did, she was pointed to web pages which introduced the group, its topic areas, its social normas and its FAQ. She browsed a bit and decided to ask a question about a book by Vernor Vinge.

She looked at the list of topics for the newsgroup and saw there was a category for authors and a topic for Vernor Vinge, since his books are discussed frequently. She puts it there.

She composes a message in a structured text editor. She decides to put in some inlined HTML graphics. This group requires postings go through a policy checker, so her posting program connects to it over TCP/IP and it reports to her that inlined graphics and font changes are not permitted in this newsgroup -- the users there prefer sticking to basic text.

She fixes this and posts again. As before, her software composes an article and signs it with a key and certificate she got from a number of known certificate issuers. It doesn't use her real name (and this group does not require it to) but it does verify her spam-fitlered E-mail address and that she's not a spammer.

The filter approves her article for meeting the technical specs of the newsgroup, and it issues the article with IP multicasting. The overview headers would go out on one channel, the body and other headers on another, so that sites could independently subscribe to just headers, or header+body together. (There are some complex issues to be resolved on the allocation of multicast channels.)

Sites which get the article would then feed out via the normal NNTP style mechanism, except in most cases the other party would already have gotten the article via the multicast. However, some would get it via NNTP. Some others might miss it. However, central sites with highly reliable feeds would publish digests of the message-ids of all articles they have received along with newsgroup and Distribution. Sites that notice that article which want the SF Books newsgroup would connect and do a TCP transactional fetch of the article from some major site they have an arrangement with for fetch-feeds.

The article goes into the stream. Readers see it categorized under Vernor Vinge, a topic they may find of interest or may have marked to ignore.

The newsgroup "Sex and Bondage" has no restrictions on posting. Bob is able to post to it by multicasting his article himself if he has tools to do that, or he can contact any relay willing to handle articles in this newsgroup. His tools make a query to find servers willing to do that on the web site for the group.

Notes on the key issues

USENET has a permission system today, it's just very limited. Sites only take in articles via their injectors, which only accept articles from people they have authenticated based on IP address (they are users of the ISP or site running the injector) or userid and password. Then the sites must have feed partners, which authenticate them, usually based on IP address, and they accept articles directly from nobody else. The injectors tend to strip the users of privacy and insert various bits of identifying information into the articles to help in the spam fight.

A certificate system lets you delegate authority as you like it. You can follow exactly the above pattern if you wish to, but you also have the option of doing decentralized delegation. Thus an injector, if the concept is still to exist, can accept a post from anybody who has an appropriate certificate delegating that trust, regardless of IP address. This is also true of sites.

The multicast system is not perfect because of the limited number of multicast groups available. You can't get one for every newsgroup and you would not want to implement it that way, so you need groupings of newsgroups designed to match popular desires. Any article posted to more than one grouping would be sent twice, which is somewhat wasteful. Very large articles are not easily multicast due to packet loss, though there are reliable multicasting prototocols which could be used. In the end not all articles can be readily multicast, some must go by the point to point flood algorithm of NNTP. Multicasting is ideal when there is a readiliy identifiable set of articles that everybody in a readily identifiable group of users will want to get. Multicasting is also good for sending only the overview headers of articles, so most sites can keep all of those and fetch bodies on demand.