If I were designing USENET from scratchIf I were designing USENET from scratch
USENET is now (2004) 24 years old, and the B news format was mostly codified 22
years ago, with few changes since then -- References lines, Supersedes and
MIME mostly.
Some wonder if USENET is irrelevant anymore in the age of the web and the
always connected internet. They are partly right, but there is still much
value in USENET. The core utilities it retains are:
- Articles you read are fed to your local machine before you read them.
That means when you do read them it is done over your local link or LAN,
at blinding speed the web can't come close to duplicating. That is not
just a quantitative difference in response time, it enables better
access just as a PC does things over a terminal. It's also better for
large files you don't have the bandwidth to stream. Send them in advance
and you can play them instantaneously -- or seek arbitrarily within
them.
- It is distributed, so people can read without contact with the poster. It's
harder for an outsider to take control of it (but also harder for it to
undergo change, evolution or innovation.) The distributed approach is also
vastly more efficient and robust. The idea of USENET "grinding to a halt"
from heavy use (as has happened to popular web sites during big events)
is effectively unheard of. This also enhances privacy -- you can read
without anybody having any indication of what you read.
- It still can be used offline, for those parts of the world where dedicated
broadband is not yet an option, however this is no longer easy.
USENET also has some non-inherent characteristics that people still value:
- Because it has remained plain text in most newsgroups, this has encouraged
a focus on substance (such as it is) over the flash of the design-centric
HTML world of the web. For threaded discussion, plain text is mostly what
you want.
- For reasons I can't fathom, USENET newsreaders remain, a decade after the
dawn of web message boards, vastly more powerful. Most message boards don't
even seem to deal with elementary concepts like knowing what you have read
and not read, or your subscription profile. They are also better at thread
handling and presentation, filtering and many other features, though they
suck at some other elements of UI.
- USENET supports crossposting, putting a message in more than one group in
a way it will not be seen more than once by people who read both groups.
Highly useful, but also perhaps the most abused feature of the net.
Finally, let me quickly list the things that are no longer special about
USENET.
- There are lots of other online communities and message boards, with web
boards and mailing lists dominant. They are accessible to all.
- Some of these do implement the powerful features of USENET like threading,
subscriptions, and readership tracking. Some even do more.
- Mailing lists are now much faster and more efficient than before, and for the
many who use combined newsreader/mailer programs, they look about the same.
Most mailiing lists now have web interfaces, with web searching, browsing and
subscription control. Through Topica
and Yahoo Groups, people have
centralized control of subscription to many lists.
- USENET was the center of community for the net in the past. It lost that
status, and in fact most users of the internet don't even know what it is.
Redesigning it
If I were designing a distributed, efficient, pre-delivered message system
today, I wouldn't do it the way we did 24 years ago. We know a lot more
and could do it right.
Authenticate
Today we know how to do digital signature. So we should sign articles and
control messages to help deal with spam and abuse. That doesn't mean we
should force everybody to reveal their real name, but it does mean we
should have some basic security on administrative procedures, moderated
groups and letting people cancel their own articles. Different groups
would have different policies on identity.
The authentication would come with a certificate system, the scope of
which is to grand to be covered here. This system would allow, as a
degenerate case, the administrative structure we use today, such as it is,
but would allow arbitrary delegation of abilities and permissions as people
like it, as finely grained as they like it.
Note that authentication doesn't mean the end of anonymity or pseudonums,
it just gives ability to tune where they are used. People want posting to
alt.sex to be anonymous and should have that, other groups may not. Many
levels of identity and authentication are possible.
Abandon injectors and multicast it
Once you have a signing system, you can abandon the "USENET site as fiefdom"
approach we now use, where you can only insert an article through the
official injector of the site by which you get access. Instead, you can
insert a signed article anywhere -- authentication of what you are doing is
done through the digital signature, not by your IP address. There is no
reason individual users could not post their own articles directly.
Indeed, if they have IP multicast access, they could multicast their articles,
so that every site on the net that wishes to receive them could get them
simultaneously, with super-high efficiency. Those who can't multicast
could find any insertion point on the net willing to gateway them to
a multicast sender.
The old "flood fill" algorithm that involved sites
sending articles to all their neighbours, who then send to all their
neighbours etc. would still be used, but it would be the backup, for those
who can't get multicast or for articles dropped by multicast.
The concept of an "injector" would become optional, and even the remaining
injectors would be simple tools that accept an article by TCP and send it
out the feed stream by multicast and other methods.
It is worth noting that the current permission and injector structure of
USENET can be implemented with the above tools as a simple case, but many
more options are also possible. One could have blessed injectors, as we
do today, and only allow postings through them by users trusted by the
injectors based on IP address or login, just as today. Though why we would
want such a rigid system, I don't know. I suspect most sites are tired of
being responsible for their users. With a switch to certifiers being
the path to those they certify, the sites can become the certifiers if
they wish, but it is no longer a requirement. Such decentralization is
what USENET's about.
Make it reliable
Part One -- no dropped articles
USENET is a robust design but in fact it regularly discards articles and
nobody knows it. Systems should regularly gather digests of all the
known messages being seen on the net (with newsgroup and distribution
information) and post those. Then sites could scan, see any articles they
missed (that they wanted) and fetch them from known sources.
With redunant postings of these lists, every site would have all the
articles it wants. No missing articles, nothing lost.
Part Two -- report errors
USENET systems, when they encounter problems with articles, don't report
the errors back to the poster, or often to anybody at all. There is a way
to do this without flooding people with errors, as it turns out. Most
USENET systems just drop articles with a problem on the floor and discard
them. That's horrible software engineering. No error should occur without
being reported to the person who cares about it.
Make it dynamic
USENET feeds articles in advance, so access to them is LAN fast. But that's
no good if you feed a lot of articles in advance to places they will never
be read. You want to minimize that, though still expect to pay some cost
for the great benefit of instant response.
Reading systems should always know what is being read by their local users,
and make sure they take in advance the most popular stuff -- as much as
they can. For the rest, they should fetch all the headers in advance if
there is any chance the messages will be read, and fetch the bodies on
demand when the first user requests them. As users subscribe and
unsubscribe, as well as change the fraction of articles in a newsgroup
they read and don't read, this should change on an hourly basis.
This way each site gets in advance all the stuff people care about, and
still has slightly delayed access (for the first reader) to all the material
on the whole net if desired.
Make it binary
With so much of the volume of USENET going to binaries, for better or for
worse, it should support articles that are binary. Forget the kludge of
uuencode and MIME base64.
In addition, support for full international character sets is a must, to
give access to all the languages of the world.
Embrace the Web
USENET people have felt of the web and HTML as their enemy. There are
good reasons for this. Many web sites emphasize form over substance,
and serve a different purpose. But the virtues of HTML have been ignored
due to this fear.
HTML should be embraced in the right way. Article structure should be
remembered, such as where the paragraph breaks are. Ditto for the blocks
of included text and signatures. The old ideals of the structure-only HTML.
Hypertext links are very common in USENET and should be supported.
Every newsgroup should have an associated web site, with a home page, FAQ
page, search links, and help page for users. Forget weekly posting
of the FAQ, it should be a single menu click in the newsreader.
Some newsgroups may even with to fully embrace HTML with all its bad and
good, its tables and graphics and the rest. There should be a way for
newsgroups which are created to choose just what level of this they want,
sticking with a focus just on the basics like paragraphs, headings and
included text, or going for the whole ball of wax if they want it.
Likewise, get the web to embrace USENET. Make the "news:" URL work
again, through the use of archive servers and a more reliable convergence
of the two. Make that URL fetch locally if the user has local access,
and remotely if they don't, but make it always work.
Totally replace newsgroup names
For get the silly hierarchy with the names full of dots. Why should a
group exist in only one place in some committee designed hierarachy argued
over in meaningless votes. Give newsgroups meaningful names. With spaces,
even, and non-ASCII characters. Move the grouping of newsgroups for searching,
browsing, feeding an policy be done by other mechanisms to collect them.
Let a newsgroup appear in as many hierarchies as make sense.
Let people own newsgroups
When nobody owns something, the tragedy of the commons is we
collectively let it
go to pot. In addition, nobody puts effort into innovation and experimentation.
Web sites flourish because one person owns the web site, and gets to
try what they want there. The web sites compete, and the good ideas thrive
and the bad ones fail. People even make money.
Let people own newsgroups, and let them compete.
If you don't like what they do, start your own or try somebody else's. That's
what we do with web sites, web message boards, newspapers, music and most
of our media. Forget the "Highlander" idea that there can be only one.
Who says? With the dynamic feeding scheme that only feeds groups if
people are reading it, there is no waste in this.
Likewise, make creation of a newsgroup be easy. Forget the bureaucracy,
bickering and voting. You don't need somebody's permission to set up a web
site. You just do it, and within a week it's in Google and people looking
for your topic can find it.
Group owners need not be despots, in fact if they are they may not get
much participation. Most should prefer to be custodians, like the operators
of web boards and BBSs and mailing lists, not pre-approving postings or
even stopping flamewars. Let a full spectrum of different policies be
tried, including fully uncontrolled groups and tightly-edited ones.
Allow limits to crossposting
Relegate crossposting primarily to announcements, and only crosspost the
first message, not the entire thread. In other words, require any followups
to a crossposted message to go to one specific newsgroup, rather than having
the audience of one newsgroup suddenly appear in another.
Allow the creation of topics.
Almost every BBS, online service and many web boards for over 15 years have
supported the idea of "topics" -- a categorization halfway between a
newsgroup (which covers a subject area and the people interested in it)
and threads, which are individual discussions with associated postings.
Let topics be created either by users, or semi-trusted users or the group
custodians, as people wish. But let users easily see the list of topics,
choose them for postings, and filter according to them.
You need support for these topics, ways to distribute the lists of them,
descriptions of them, help for them, etc.
Allow easy subsetting of the net
To encourage innovation and experimentation, make it easy, using a header
like the old "Distribution" header, to create subsets of the net where
experiments can take place but also quickly get a large audience. This
is another area the web did well, since anybody who wanted to extend it
could write a plugin, and any user in the world could download the plugin
to try a whole new feature. The plugin concept should be extended to
newsreaders.
Compress
While not a vital need, it makes sense to compress articles when they are
generated, leaving only a few special headers uncompressed, and to save
decompression for the newsreader. For the text component of the net,
it would cut traffic volume to less than half. A specialized compressor
that understands USENET's headers would do even better, possibly cutting
the flow of non-binaries to 1/3rd.
Putting it all together
Here's how the process might go for a user of the new USENET above, using.
a newsgroup called "Science Fiction Books" rather than "rec.arts.sf.written"
Alice, our user, found this newsgroup using a text search (search engine
style) rather than browsing a hierarchy. She joined it, and the moment
she did, she was pointed to web pages which introduced the group, its
topic areas, its social normas and its FAQ. She browsed a bit and decided
to ask a question about a book by Vernor Vinge.
She looked at the list of topics for the newsgroup and saw there was a
category for authors and a topic for Vernor Vinge, since his books are
discussed frequently. She puts it there.
She composes a message in a structured text editor. She decides to put
in some inlined HTML graphics. This group requires postings go through a
policy checker, so her posting program connects to it over TCP/IP and it
reports to her that inlined graphics and font changes are not permitted
in this newsgroup -- the users there prefer sticking to basic text.
She fixes this and posts again. As before, her software composes an
article and signs it with a key and certificate she got from a number of
known certificate issuers. It doesn't use her real name (and this group
does not require it to) but it does verify her spam-fitlered E-mail address
and that she's not a spammer.
The filter approves her article for meeting the technical specs of the
newsgroup, and it issues the article with IP multicasting. The overview
headers would go out on one channel, the body and other headers on another,
so that sites could independently subscribe to just headers, or header+body
together. (There are some complex issues to be resolved on the allocation
of multicast channels.)
Sites which get the article would then feed out via the normal NNTP style
mechanism, except in most cases the other party would already have gotten
the article via the multicast. However, some would get it via NNTP.
Some others might miss it. However, central sites with highly reliable
feeds would publish digests of the message-ids of all articles they have
received along with newsgroup and Distribution. Sites that notice that
article which want the SF Books newsgroup would connect and do a TCP
transactional fetch of the article from some major site they have an
arrangement with for fetch-feeds.
The article goes into the stream. Readers see it categorized under
Vernor Vinge, a topic they may find of interest or may have marked to ignore.
The newsgroup "Sex and Bondage" has no restrictions on posting. Bob
is able to post to it by multicasting his article himself if he has tools
to do that, or he can contact any relay willing to handle articles in
this newsgroup. His tools make a query to find servers willing to do that
on the web site for the group.
Notes on the key issues
USENET has a permission system today, it's just very limited. Sites only
take in articles via their injectors, which only accept articles from
people they have authenticated based on IP address (they are users of the
ISP or site running the injector) or userid and password. Then the
sites must have feed partners, which authenticate them, usually based on
IP address, and they accept articles directly from nobody else. The
injectors tend to strip the users of privacy and insert various bits of
identifying information into the articles to help in the spam fight.
A certificate system lets you delegate authority as you like it. You can
follow exactly the above pattern if you wish to, but you also have the
option of doing decentralized delegation. Thus an injector, if the concept
is still to exist, can accept a post from anybody who has an appropriate
certificate delegating that trust, regardless of IP address. This is
also true of sites.
The multicast system is not perfect because of the limited number of
multicast groups available. You can't get one for every newsgroup and
you would not want to implement it that way, so you need groupings of
newsgroups designed to match popular desires. Any article posted to more
than one grouping would be sent twice, which is somewhat wasteful.
Very large articles are not easily multicast due to packet loss, though
there are reliable multicasting prototocols which could be used. In the
end not all articles can be readily multicast, some must go by the
point to point flood algorithm of NNTP. Multicasting is ideal when there
is a readiliy identifiable set of articles that everybody in a readily
identifiable group of users will want to get. Multicasting is also good
for sending only the overview headers of articles, so most sites can keep
all of those and fetch bodies on demand.
|