Bulk E-mail protocols and tagging

Please note: This is an old essay from several years ago, and while it is up because it contains some useful food for thought, I have revised my views in this area but not yet the essay.

The "Spam" problem has arisen because parties wishing to push a message or product are creating mailing lists and cheaply doing bulk mailing to them without asking (and usually against the will of) the people put on the lists.

One proposed solution is to define a way to describe attributes of bulk mail, and mail from unknown parties. This method can be used to define a protocol to let mail servers express what sort of mail they will accept, or tags to go on mail describing how it is sent.

This is a long term solution since it requires changing mail agents and mail sending tools as well. Other short term solutions will be tried first.

In particular the attributes proposed here for tagging are:

The number of expected recipients for the message.
The sender-recipient relationship; namely, whether the sender is a stranger to the recipient, or personally known, or the message was solicited.

These two attributes say nothing about the content of the message. Tags about content (as proposed in some laws) are a poor idea. The problem is unwanted bulk mail. It is bulk mail that should be identified.

Other things might be tagged, but these two attributes are enough to identify unsolicited bulk mail from strangers, and thus provide the tools to deal with the problem.

How to use them

Preferably the tags would be used in a protocol as described below, allowing mail servers to filter mail according to the attributes above. However, because most mail is relayed through several sites with the "MX" system, it is not usually possible for the mail server that first receives a message to know the policies of the final recipient. As such, the tags would also need to exist within E-mail headers directly.

There already is, in fact, an E-mail header called "precedence" which can have values like "junk" and "bulk." This is primarily to assist mail servers in assigning priority to messages. Its use is becoming more rare, and it is not sufficient, even when correctly assigned, to tell apart the types of mail one wants to tell apart.

Thus a tag of the form:

Bulk-Tags: Recipients=300 By=stranger

might be appropriate on a message.

In their most basic form, mail transport agents and clients might be programmed to filter or redirect mail that had certain attributes. Mail would go into different destination mailboxes or "mail folders" based on these and other attributes.

This ability is already common in mail tools. Today in fact many mail tools have put in pattern based filters that can redirect or delete mail that contains certain "spam signature" strings in the text or subject. Should such protocols be defined and come into use, we can expect mail tools to very quickly adopt them with nice user interfaces.

Opt-out Protocols

Preferably as noted, these tags might form the basis of an "opt-out" protocol. Just as today people can place a "no soliciting" sign on their gate, some wish to put up such a "sign" on their mail server. To do that they need a protocol, to say in formal computerese, just what is welcome and what is not.

The mail protocol, known as ESMTP, could be extended to allow the recipient to express desires before mail is even sent. This is primarily in the interests of efficiency, to avoid having the recipient's mail server take the load of receiving messages that are undesired and will be discarded anyway. It's also a favour to the sender from an efficiency standpoint.

(Because with today's protocols the mail server receiving a message may be a "mail exchanger" that will relay to the actual recipient, header tags would still be needed for when the filtering decision needs to be made further down the chain. Eventually systems would be made smarter so it can always be made on the spot.)

It's also probable that servers would "bounce" mail that was blocked due to tagging decisions. This bounce might be taken as an order to send no more mail (for a named period of time [TTL]) of the type bounced. Any mailer who constantly attempted to send such mail after being notified might be guilty of a denial of service attack.

Making them work

There are several levels at which a tagging system could be applied. The central question, of course, is how to make people who are already abusing E-mail put tags on their messages that may cause their messages to be declared 2nd class. With paper mail, senders gladly put bulk-rate postage on their mail, thus tagging it as bulk, because they save money. This incentive is not present for E-mail tags.

There are several answers to this problem, from voluntary systems all the way to laws. Here are some.

Voluntary Use

Surprisingly, many spammers would be happy to comply. In fact, several of them, in an effort to gain some respect, have proposed tagging systems of their own, even content-based ones. Tagging will indeed give them an answer to their detractors. Many of them would even promote it, and push users to get filter software by declaring in their messages, "Don't want to get these postings? We put on Bulk-Tags! Get mail software that uses them."

They'll do this because some want to make a serious business out of bulk E-mail, and they figure this will help them. And indeed, it will, because at first (and perhaps for some time) many people will not have filtering tools. And some of them are quite willing to lose as audience those who would block their mail. Not all, however, as we know there are some who will go to lengths to get around blocking tools.

Universal Use

Tags might succeed if they become popular enough that almost all people put them on all their mail. In this event, people will be able to bounce or redirect mail that comes without a tag.

To make this happen, a simple tag would be defined that would be permitted for "small volume" mail, which is to say the person to person and small group E-mail that is all that 99% of mail users send exclusively. Mail clients, such as Eudora and Netscape, could be programmed to add this special tag without any effort by the user. Only bulk mailing programs -- programs that explode to mailing lists, not commonly used by ordinary users -- would need to worry about the fine points of the attributes being tagged.

The EMA, which represents the vendors of E-mail clients, has expressed interest in this approach.

This does not address, however, people who would deliberately lie on a Bulk-Tag.

Trademark Protection

The Bulk-Tag could be given a name which can be (or already is) trademarked. Like the TrustE mark, whose use indicates compliance with a code of privacy ethics, or the "Good Housekeeping Seal of Approval" which requires endorsement from a magazine.

This mark might be licenced for use to all members of the public who agree to follow the rules of the tag. Use without following the rules would be a violation, which could cause a lawsuit.

Fraud law

It may be the case that deliberately lying in a tag in order to push a message through to people who would not otherwise read it may be a form of fraud if it is done for commercial gain.

If this turns out to be the case there would be strong tools to use to attack those who would abuse the system.

Tagging laws

Already some legislatures have proposed laws requiring more drastic tags, like the word "advertisement" in the Subject line. These are a poor idea, especially if they require tagging about the meaning of a message rather than the "time and manner" (ie. bulk) of its sending. And clearly governments shouldn't be defining mail headers or protocols.

However, it might be considered that if all other methods fail, a law which requires an honest bulk-tag on bulk mail from strangers, or which simply clarifies that lying on such a tag is indeed fraudulent could be effective.

It's also worth noting that, for those considering other legal attacks on junk E-mail, almost anything that can be made unlawful can be made compulsorily tagged. Such a tagging law would clearly be less restrictive than an outright ban, as it simply gives recipients the tools to decide for themselves what goes in their mailbox. As such it should be noted almost any non-tagging law is unconstitutional in the USA, because less restrictive laws are possible.

Definitions of Tag Terms

Recipient Counting

The number of recipients would be the total planned lifetime recipients for a message. That would of course include those in the current mailing, plus any mailings the sender is already planning to do. This doesn't stop people from changing their plans. If one sends a message to 100 people one week, it goes out with a count of 100 on it. Later, if one decides to send to 1000 more, it goes out with a count of 1100 on it. However, if the sender always planned both mailings, both go out with 1100 on it.

While intent is a matter for the law and not computers, the number of recipients is a factual matter, and that's good. Tags should deal only with matters of fact.

There is one subjective matter to deal with in recipient counting, and that's messages that are varied slightly. This requires human judgement in many cases, but it's still a factual determination. Computers may not spot when a message is subtly rewritten to look different to a computer but have the same result to a human, but a human can of course spot this, if the matter is every brought before a court.

The number need only be a "best estimate" accurate within as wide a range as 50%. It need not include list-exploding addresses unknown to the sender.

Relationship

There are four possible values for the tag about the sender, called "By".

Known -- the sender is personally known to the recipient. This means the recipient has had personal interaction with the sender. For example, a past business relationship. Or stopping by a booth at a trade show. Or mailing the sender. This does not mean simply that the sender is famous and as such the recipient is expected to know them. The relationship must be voluntary on the part of the recipient.
Stranger -- the sender is not personally known to the recipient.
Solicited -- the recipient solicited the mail. This may mean they joined a mailing list, or asked for more information, or made a public solicitation for this sort of mail. In this latter case, the content of the mail matters, since most recipients would not make a blanket solicitation for all mail. For solicited mail, it is not relevant whether the sender is a stranger or not.
Simple -- this special class is for use by ordinary users not doing bulk mail. It would be added by the typical mail clients of most users, after they had, upon installation, clicked on a dialog box agreeing they would not be doing bulk mail (at least not with this tag.) For simple mail, the relationship of sender and recipient is unimportant, and the recipient count is also unimportant, though the mail client can and should certainly insert the number of people the message was sent to.

Now you can see where this is going. Most people would elect to receive simple mail, mail from known parties, solicited mail, and even mail from strangers sent to a small number of recipients. That number would be up to them -- most people might be quite willing to get junk mail so finely targeted that it only goes to 100 people.

Other mail would be redirected to a lower priority folder. Or senders might develop their own, entirely voluntary tags, to help mail programs filter mail for the recipients. There might be formal keywords, not unlike the subject, to say that the mail is about computers, or sex sites, or comes with an offer of cash. I would leave that to the industry to decide and regulate.

One additional tag might be defined, "abuser." This tag would be used by parties denied, due to past infractions, from using the other tags.

Mailing Lists

It is often asked what do mailing lists do? To use this system, programs that do bulk mail need to add the tags. An ordinary mailing list only sends to people who ask to be on it. They would use the "solicited" relationship tag. They would also know how many people are directly on the list, and be able to add that number. They would not need to worry about forwarding exploders and their counts. Exploders forwarding mail that has a count on it would add to that count.

The only people who would have a complex tagging issue, needing special user interfaces would be the vendors of bulk mailing software. These are few.

Overloaded mail servers

Many people complain that a big problem with junk E-mail today is that the volume is so high it's overloading mail servers and putting a significant cost on the ISPs that run these servers, notably at places like AOL. Tags would not cut the mail transmission volume down, and so this problem would persist.

This is where the opt-out protocol system would come into play. It's not really in the interest of junk mailers to mail people they know will discard, even though internet mailing is very cheap.

As tagging reduces the utility of carpet-bomb junk E-mail, the volume of it should naturally decrease as it becomes in the interests of senders to more tightly target their mail. However, if they don't, other things such as throttles on server use, and opt-out protocols can address this problem.

This legitimizes junk E-mail?

One criticism of these plans is that they add some legitimacy to junk E-mail. Right now, only the bottom feeders of the net business world participate in junk E-mail, and tags might cause real companies to participate. My prediction is that it won't. Real businesses still don't want to tick off their customers, and they don't want to be found sending vast volumes of mail that are just rejected.

Some people, who don't elect to filter or get software that can filter, might indeed get more junk mail. If the alternative is outlawing certain forms of communication, this may be a price that must be paid. We have no duty to protect those who will not use tools. If tags can solve your E-mail problem, they should be enough for you.

They'll lie, they'll cheat, they'll steal

Some will. It is hard to predict if the total sociopaths will stay around or not, if their methods become less effective. If they do, then I concede that some other methods, including possibly laws, would be necessary. We have a duty to try less intrusive methods first.

Protocol Formalities

Here are some notes on how this might be implemented.

The two attributes (recipient count and relationship) need to be encoded both to describe a single message (as shown above) and also to describe a policy. To describe a policy they would be written as a simple boolean expression, using comparative (less than, greater than) operators and the "or" and "&" symbols.

Stage 1

In the long term, policy should and must be set by the individual, not the site. While sites have the right to set policy, this should not be encouraged or be the norm. However, at this immediate date, software is not set to allow easy individual declaration of policy.

Major software packages can today easily allow system admins to control the "SMTP Greeting" banner used when mail systems talk to one another. It is proposed that for a limited period of 2-3 years, that policy be settable for a site in this banner. After that time, MTAs (Mail Transfer Agents, the mail server software) would be expected to express and control policy on a per-recipient basis.

Of course, this would not stop sites from setting a default policy for all users, or forcing a policy on their users. However, they would need to have protocol-aware software, and we don't wish to design ths protocol to encourage sites to make decisions for their users about what E-mail they get.

(Sites that really want to set policy are free to even charge users extra not to have a bulk-blocked mailbox. What matters is that the decision about what to receive and read remains with the user.)

As such, the SMTP or ESMTP greeting would contain the policy as a term, delimited by semicolons, of the form UBE(terms). For example

UBE=(Recipients > 20 & By=stranger)

The term states what mail is not permitted at the site. In this case, unsolicited e-mail with over 20 recipients where the sender is a stranger to the recipient.

In full form it looks like:

220 main.templetons.com ESMTP Sendmail 8.8.8/8.8.8; Mon, 15 Jun 1998 13:15:37 -0700 (PDT); UBE=(Recipients>20 & By=stranger);#No unsolicited bulk E-mail from strangers with over 20 lifetime recipients

The comment in English is there to liken this to the physical "sign on the gate" that one might have on one's house. This protocol is the E-mail analog of that. It is not expected that in normal mail that the mailer ever sees the SMTP/ESMTP greeting or would read it.

Stage 2

Per-recipient bulk mail policy would be done with an ESMTP protocol, as defined in RFC1869. The server would respond to EHLO with a keyword indicating it supports per-recipient bulk E-mail policy.

If the server does not support the enhancement, the sending program MUST insert the parameters as an ordinary E-mail header in the message, in the "Bulk-Tags" format. This allows a client or relaying program down the line to still make filtering decisions based on the bulk mail attributes of the mail. Indeed a smart relaying agent could extract these tags, and when talking to another smart recipient, provide them on the RCPT TO command line.

In fact, the Bulk-Tags should be added to all unsolicited bulk E-mail, even if the protocol is handled, to assure they are present at the end for analysis and further filtering.

When attempting to deliver mail, a sending program identifies a recipient with the command RCPT TO In this case, parameters would be added to the line following the protocol above.

RCPT TO: user@site.domain Recipients=40 By=stranger

The receiving mail server could respond with the ordinary "OK" code to indicate the user will accept the mail. The mail should be sent including the Bulk-Tags.

The receiving server may also respond with an error code. A special error code would be assigned to indicate that the mailbox and other parameters are valid, but the mail is rejected because it is not the policy of the given address to receive mail matching the named parameters, or a greater number.

The error message may contain a statement of the mailbox's UBE policy, specifying messages that will not be accepted. If it does not, it should be assumed that, for a period of the next 6 months, the address does not accept mail matching the provided parameters, or with a higher number of recipients.

Policies are expected to last for 6 months, unless specified by an "Expires" tag which provides a date, in RFC822 format.

The error should be received by the sending server, or turned into a bounce message back to the originator as need be. This error should be considered as a demand not to send E-mail matching the parameters of rejection again, until the policy expires. Senders which continue to send mail to such addresses that violate such requests may be guilty of a "denial of service" attack if the volume of such requests produces a noticeable load on servers and their resources.

Multiple recipients

If none of the recipients of a message accept the message, it should not be delivered. If one or more accept it, it should be delivered. However, responses from addresses who rejected the message based on policy matters should be recorded and used to prevent future mailings which violate those policies.

Stage 3

Some users would like to set policy against not just unsolicited bulk mail, but also general unsolicited mail or unsolicited commercial mail. However, it woudl be an unfair burden to require ordinary mail sending clients to have to understand this protocol just to send non-bulk mail.

As written, this protocol need only be understood by bulk-mailing programs, and used only by those using bulk-mailing programs to send unsolicited bulk mail to parties who do not know them. This is not a great burden on the mail system.

However, if, in the future almost all mail clients and servers implement this protocol, it might be possible to extend the tags to support more policy determinations. The author does not recommend this unless a significant problem of E-mail abuse not involving bulk mail arises.