Best way to end spam

After many years of consideration, writing of tools and examination of other tools, I have devised what I believe would be the best current approach to solving the spam problem which does the least damage to the open E-mail system.

It is a combination of a variety of technological methods and also non-technological methods. That's because in our society, we actually rarely solve problems by adding extra security or building our fences higher. Instead, we use systems of accountability to enforce the rules we decide to embrace.

Executive Summary

The plan is to divide the network into two camps, those who can be held accountable for spam, and those whose status is unknown. Mail would flow unimpeded for those on the accountable list, since by definition, we would have other ways to deter or deal with spam from such networks.

For the rest, mail would be redirected through special relay servers whose job it is to "throttle" or rate-limit the amount of mail any party can send. As such, single person to person mail would normally be unimpeded, but mass mailing (regardless of content) from untrusted addresses would be impossible. In effect, mass mailing becomes a slightly privileged operation open to those who can be held accountable if they abuse it by sending such mailings to people who don't know the sender.

Medium volume mailing (small lists, sudden bursts) that did not have some sign of accountability would be slowed rather than blocked, mainly to detect if the volume is getting to spam levels. Small mailings would still be delivered at a slower rate, but injurious spam campaigns would not work.

In effect the only limitation put on E-mail is that those who wish to host mailing lists must get on the list of those who will be accountable for the abuse of lists. It's even possible to run open relays again. All other mail is delivered.

Step 1 -- Whitelist those who will be accountable for abuse

What really matters is not whether you can block a spam attack, but whether there is some deterrence to inhibit the attack in the first place. Without dictating what rules you want to lay down, the first thing that must be established is whether others can be held accountable for abuses that break those rules.

In an open, end to end network, you have no means to hold all other parties accountable for their actions. However, you can know that some will be accountable, and that others are of unknown accountability.

Accountability simply means if they break the rules you have some assurance that a negative consequence can be enforced. That might mean they live in a jurisdiction with a law that will punish them, and you can apply that law. It might mean that they have agreed to a contract which both promises that they will follow the rules and spells out consequences if they don't. It might mean they are part of a collective that will hold them accountable and thus makes that promise to the world. They might have posted a bond that they will lose if they violate the rules. You might know them personally and trust them.

The exact mechanism of accountability is not the key issue. What matters is whether it is judged strong enough to deter the antisocial behaviour, in this case spamming, that we're trying to get rid of.

A well known corporation signing a contract that its employees won't spam or a penalty will be paid would certainly be worthy of trust. An ISP stating it has and will enforce a contract with its dedicated line customers that forbids spamming will also be enough, since these customers at the very least can lose a costly connection and more likely can be subject to lawsuit. Multi-year trusted customers might be considered accountable but brand new users with dialup accounts would not normally be considered accountable.

So step one is to build a whitelist of the IP addresses whose users can be held accountable by non-technical means for any abuses of the network. All mail from those addresses should be accepted without filtering or blocking of any kind.

Some spam will come from these addresses of course. Some of the users will have poor security and spammers will abuse that. Some spammers will use tricks or lies to get access to these trusted addresses, at least for short periods. Some whole networks will join and attempt to hide in jurisdictions with less accountability than expected until that lesson is learned. Some will simply promise not to spam and break that promise, figuring they can handle the cost. Sometimes the standard of accountability won't be high enough at first to deter spam.

However, if the system is done well, the amount of this spam will be quite small, and thus quite manageable. It will be a minor annoyance, but no more than spam was in the very earliest days. Not pleasant, but not worth extreme measures either. Over time, the standards of accountability would be fine tuned to the right level to reach the balance we desire.

This is the normal way systems of governance work. We always tune our systems to let a few bad deeds go by unpunished, as long as the level stays where we can tolerate it. We can never stop everybody -- but we can solve the problem. This is not just more efficient, it assures that our system is not going too far, stepping over the line into punishing the innocent or chilling legitimate communication.

At a typical ISP, some IP addresses would go on the whitelist, and others not. As noted, new dial-up accounts would be assigned addresses not on the whitelist (or as some ISPs do today, addresses that can't do outgoing SMTP at all.) Well known and trusted customers would get addresses on the whitelist. Most users might end up directed through a relaying server which includes tools to limit the volume of mail through it from any given user or set of users, allowing the relay to be on the whitelist even if those who use it are not.

As you'll see below, however, while this whitelist allows unfiltered traffic for those on it, only minor impediments are placed on those not on it. In particular, the only ability they lose is the ability to send bulk E-mail and in some cases instant E-mail. You only need to get on the whitelist if you want to host a mailing list. (You don't need to be on it to run the list, just to host it and do the actual delivery.)

Throttle non-whitelisted addresses

The addresses not on the whitelist don't get blocked. Instead they go through some spam-blocking checks. In particular they can be resource-limited or "throttled." This is a process similar to what is used in many computer systems to control access to limited resources. The more of a resource a given address takes, the more limits are put on to cut back the flow.

To accomplish this, a collective of major mail recipients (ie. several of the larger ISPs and companies) would be formed. This collective would operate a network of "throttling mail relays." This network would be a large collection of old, slow PCs running a free OS and some special mail software. The machines would be independently operated. What makes them a network is they would share data on incoming email volumes from untrusted network blocks.

Members of this collective would configure their mail addressing so that servers from the network of throttle relays appear to handle all their incoming mail. In technical terms, this is done by setting this network as the lowest (top) priority MX, or "Mail Exchanger." This means that somebody sending mail to such a system would normally try to send it by connecting to one of the throttling relays.

However, using a technical trick, if the sender is on the whitelist, this won't actually happen. If the sender has a mail tool designed to use this system, it will know it is on the whitelist and simply ignore any "MX" that uses the special domain of the throttle-relays. However, if it doesn't know to do this, the throttle-relays will refuse any connection from a whitelisted address. In either case, the sender will skip the relays and move to the next highest priority MX, which will be the real mail servers of the mail's target.

So, in proper operation, mail from whitelisted parties is delivered directly, just as it is today, and the throttle-relays are not involved. There may be a delay of a few seconds if the connection-refusal trick is used.

Non-whitelisted parties would connect to the throttle relays. If they were to violate the E-mail standards and bypass them the way a whitelisted party can, it would do them no good. The next level servers would only accept connections from whitelisted addresses, and from the throttle-relays themselves.

It's also possible to have the throttle-relays be last, rather than first, on the MX chain. This means the real servers will be tried first in a chain, but connections will be refused from non-whitelisted systems.

The job of the throttle relays is to slow down the mail from non-whitelisted addresses if the volume gets too high. They accept incoming mail, and do not deliver it immediately. They place it in a waiting area for a short delay. They also count the number of messages (counting the number of recipients for each message) for each IP address, and each associated group of IP addresses (known as a network block.)

On a regular basis, they share these counts with all the other throttle servers. They can have their own internal network of dedicated connections to share this data, or broadcast it regularly using IP multicasting. The USENET system might actually work well to broadcast this data.

Together the systems would track the volumes of mail per unit time from all addresses and blocks. They would know both typical volumes of non-list mail for an average address, and for the addresses in question.

If an address is sending out a volume of mail within range of the typical volumes, the mail is relayed on immediately to its proper destination. However, if the volume grows, the mail is kept in a queue or "spool", and only some of the mail is forwarded on, while the rest remains in a queue as the volume is tracked.

If the volume goes back down again (ie. a brief spike) the mail from the spike will be delivered eventually, just not instantly. If the volume goes up, then the queue will get longer.

Note that mail which is delivered to the throttle relays with some other form of accountability on it (such as a digital signature with certificate, a CPU-stamp or anything else) it would not be throttled and would be delivered directly. This is detailed below.

Spam-Scoring

Ideally, the system would attempt to sort the delayed mail in the queue, using any number of sophisticated spam-scoring algorithms and filters people have developed to stop spam at the endpoints. If these tools do their jobs well, the ordinary, low-scoring mails would move to the front of the queue for fast delivery, while the spams would move to the back.

If the surge was very large -- a full fledged spam attack -- then the messages in the queue would be rejected and removed from the queue, ideally with a bounce message to the sender. (If there is just one sender, this bounce will be sent detailing all the undelivered addresses. If each message has a different sender -- in a sudden surge that's almost always a spam attack -- best efforts would be made to send a bounce to all senders that exist.) In addition, targets that had mail blocked would get access to a digest of all blocked mail via the web (or a single summary E-mail sent at some requested interval) which would allow them to fetch any blocked message. However, since low scoring messages would get through, this should need to be called upon only rarely. However, in the strict sense, no mail would go undelivered, though some might not get looked at by user choice.

Some of the spam scoring systems are getting quite sophisticated, and in fact many people find them adequate on their own, though they cost resources and run a risk of false positives. Such systems, along with spam-traps and a fast system for user complaint would stop any spam attack dead in its tracks.

You will notice that, in the interests of not slowing down the mail much, the first few messages of a spam-attack might be delivered. However, if you realize that is just a few messages out of 100,000, the odds are rare that you would see them. (Some random factor would have to be added to save those sad few who are at the start of all the spammer's lists.)

Note as well that with this system in place, it is possible to run an "open relay" mail server off the whitelist, and even on the whitelist if it has its own techniques to throttle spam attacks. During a spam attack, the users of the mail server would find themselves throttled along with the spammer, but if their mail had low spam-scores, it still could be delivered while the spam is held up. However, after the attack, their mail would work again. This is much preferable to the current systems that ended up permanently blacklisting people for running a mail server in what had been, in the pre-spam era, the traditional manner. Many were uncomfortable with the idea of punishing not the spammers, but the spammer's victims, with blacklisting. In a throttle system, open relay users may see problems during a spam attack, but the problems are brought on them directly by the attacker who is tying up their resources (including their mail quota) and not so much by the anti-spam system.

E-stamps and CPU coins

There can be other means of mailing to assure your mail is not blocked or delayed, even if you are not on the whitelist. Long ago I proposed the idea of "E-Stamps" which are effectively micropayment cheques placed on E-mails that say, "If this E-mail annoys you, you have the option to collect the following amount of money."

The idea is the recipient normally does not collect, and in fact it's rude to collect unless the mail was a spam. Needless to say, any mail with such a promise on it could be delivered immediately. It is suggested Paypal could host the money for such a system.

Another idea related to this is CPU-stamps. In this system, any sender of mail includes a special number which takes 5-10 seconds of CPU to calculate, specific to the individual mail. It delays regular mail slightly but uses a resource that's free to most senders (spare CPU.) However, a bulk mailing of a million messages is not possible.

Again, any mail with such a CPU-coin on it could be delivered without delay -- even if there's a spam attack underway from the same IP address.

A system called Habeas, for example, offers a special trademarked and copyrighted string set that accountable mailers can insert in their mail to show their accountability. The company plans to sue any non-licenced party that uses the marks. If that works, such senders could be passed through quickly without delay.

Some want to work out ways to sign their E-mails, using digital signatures and certificates. While we don't want to demand that everybody have such a signature, mail with a certificate that indicates accountability for abuse could again be let through without throttling. This allows an easy decentralization of getting yourself unthrottled, though at a higher CPU cost unless the mailers know to avoid the MX throttle relays.

In fact, any major spam blocking idea from the past or future can be used on the throttle servers to prioritize the mail when the volume gets over the threshold, or to allow certain types of E-mail to get through without delay.

Some projects are advocating forcing everybody to sign all mail. We don't want a world like that. This system allows signing (which has other virtues if it is voluntary) to be just one way to show mail isn't spam.

Windows into the system

I believe this system would be remarkably effective. The main security window would be one that exists for all problems. A determined spammer could break into the systems of whitelisted users and spam from there. Like a DDOS attack, a spammer could compromise many PCs (even non whitelisted ones) and spam from them, at slow per-PC rates. However, both of these are criminal activities, with much greater punishments than any anti-spam-law could provide, and the number of spammers who could pull this off would be very few.

Spammers may also attempt to DOS attack the relay servers, simply out of spite. This is a problem when any identifiable computers are involved in the anti-spam fight. These servers would need to use whatever techniques are available to deal with DOS and DDOS -- though admittedly these are not great. The spammers don't gain any ability to spam by doing this, they just stab at their enemies.

Spam can also come when spammers mail to an existing, whitelisted mailing list which redistributes to all its subscribers. Such mailing lists must use other techniques, such as filters which direct some or all suspect mail to their moderators. This is outside the scope of this plan.

It is more likely spam would come from people inside the system. This must be dealt with by non-technical means.

If it turns out it can't be dealt with in this way, there is a final technical answer -- the site that can't handle its internal spammers and hold them accountable leaves the whitelist until it can. It is not blacklisted and unable to mail, for it can still send its ordinary levels of person-to-person mail. Only bulk mail is throttled. Even bulk mailers on the system who have a certificate and digital signature code can still bulk mail from the address.

It is important to note that even if the scoring systems do a bad job on a batch of mail, and let spam through rather than real mail, the total volume of mail is never more than the expected volume from person to person mail. As such, it's easy to show that the overall volume of spam into the system as a whole is not more than the average volume of person to person mail, and thus it is easy to handle. While it may bring cries of anguish to hear what has been called the spam-apologist's phrase of "just hit delete," I think this is a meaningful solution to spam that comes at such extremely low volumes.

Because of the volume limiting, even a spammer that manages to beat the spam scores attacks not the targets so much as they steal quota from the legitimate users on their own network. This is not a pleasant thing, but it does isolate the problem.

Operating a mailing list

As indicated, if you wish to host a mailing list that is more than a few people, you need to be on the whitelist. That's not hard, just promise not to spam and show something to back up that promise. That may be nothing more than a good reputation -- which is, after all, how we often do things in the real world. If you wish to run a mailing list from a non-whitelisted ISP, but have a static IP, you can get that IP whitelisted in most cases. If you have no reputation, and no way to assure your accountability, a system could be set up where bonds are posted with escrow agents, to be collected to help pay for the anti-spam system in the event that you're found guilty of spamming.

Of course, if you want to run a mailing list you don't have to be whitelisted yourself. You can find somebody else who is whitelisted to host it. You must convince them you will be accountable of course. Or they may use technological means, such as doing confirmed/double opt-in for new subscribers, limit on new subscribers/day etc.

You can't host a mailing list anonymously (without posting a large bond) but you could run one. And you can mail anonymously, either in low volumes, or even in large volumes if you can find somebody to front for you or you can post a bond.

During early introduction of the system, it might be necessary to take special steps to deal with mailing lists, as mailing list hosts won't work to whitelist until the system is used by many users, and those users won't want to miss their mailing list mail.

One way to do this (at a temporary cost in labour) is to have humans look over mail surges that are so large as to be blocked which get lower-end scores from the various spam-detecting algorithms. The lowest scoring items can be given to humans who can quickly approve the mailing, place a temporary whitelist on the address and contact the list administrator about getting whitelisted. If they won't do it, a specific test for their list's specific attributes can be put in place to pass the mail, though such tests would have to be taken down once spammers decide to forge mail as though it is coming from that list. Fortunately that should take a while, and list owners can be convinced to whitelist before that.

The role of spam law

While I don't hold much hope for anti-spam laws on their own -- so far the ones passed have been completely ineffective, and none will work well over international boundaries -- they can be used inside such a system.

Simply, if a jurisdiction can be shown to have an effective anti-spam law that really holds spammers accountable, then people in that jurisdiction can be immediately whitelisted. If they spam, the law can be used to hold them accountable. People who don't live in such jurisdictions must choose some of the many other means to be accountable.

The anti-spam collective

The anti-spam collective would run the throttle relays of course, though probably not physically. There would be large numbers of them, scattered in ISP rooms all over the world. Any ISP joining the system would throw some old servers on their network and devote them to the anti-spam collective. It is a curious engineering goal that the job of these servers is to slow down E-mail. Rarely are engineers charged with making their systems slower. As such, old, effectively free equipment running free software can do the job.

In addition, some colleagues of mine -- Landon Curt Noll and Mel Pleasant -- have recently devised an interesting plan that would make the spam load on even the throttle servers remain quite low. During a spam attack (high volume of e-mail from a given network) the system would be programmed to return a "temporarily unavailable" status to any mail attempts from those addresses. It turns out most spammer bulk-mail programs don't even waste their time retransmitting a spam if they get such a status, they just move on to the next address. But even if they do, this diverts the load away from the server and allows more time for information sharing, scoring and spam identification.

In addition, the anti-spam collective would also help ISPs enforce anti-spam terms of service. Most ISPs already demand that in their contracts with their customers. Should a customer violate these terms, the ISP wants to sue them. It's expensive and only teaches a lesson within the ISP.

The anti-spam collective, on the other hand, would be given the power to sue under the ISP's TOS, and it would have the motive and means to set examples of people who violate these contracts. Spammers would realize that if they spam through an ISP that is a member, the chances they will be nailed to the wall are high. In addition, this would be true over international boundaries, in a way that national laws can't be.

Rare Blacklisting

In a few rare events, and only after significant proof, some addresses could be blacklisted in the name of efficiency. These would be addresses that belong purely to unrepentant spammers who send out large volumes of spam. Their spewings should be considered a denial of service attack as they fill up the throttle-relay's queues, and eventually they should be blacklisted and connections from them refused, though with a proper system of proof and appeal.

Tool of censorship

One of the main concerns over blacklists has been their ability to be used as a tool of censorship. They often lack adequate checks and balances on their own abuses. The anti-spam collective described here must have checks and balances on itself as well. The network of servers would be distributed multinationally, and the systems should be open for inspection by auditors to assure they are only throttling, and not blocking mail except from a very few blacklisted addresses.

While they all must share the common whitelist and the common data on current mail volumes, otherwise they should be run by different and disparate groups, so they are not under the control of one entity. When beginning, there may be one single organization, but eventually it should split. There could even be competing organizations, which share mail volume data but otherwise differ in the quality of their spam scoring and other algorithms. Each could compete to serve different ISPs for it is not necessary that all ISPs use the same relays -- indeed, most ISPs would want to use a set of relays that were close-by on the network. In all cases, however, there must be multiple relays to assure that they don't all go down at once. They must also be protected from DOS attacks by spammers who would like to see that happen.

Tool of surveillance

Over time, it is necessary that a good fraction of networks get on the whitelist. Otherwise, mail from then would be concentrated at too few sites, making those sites a ripe target for surveillance. Of course, there should not be just one collection of throttling relay servers, there should be many of them. In effect, a large enough pool that one can be assured that they don't all go down at once (which would make mail undeliverable) but small enough not to be a tool of surveillance.

The pools still need to share data, but it's very aggregate statistical data, at least until such time as a spam attack is detected, in which case they might share more data such as header values, or spam similarity data. Thus only spammers would have their data centralized, and one could not tap the entire email infrastructure at one single point.

Definition of Spam

As I have argued extensively elsewhere, the definition of spam used in such a system must be narrow. In fact, it should be approximately the intersection of the definitions favoured by the major players using the system. As long as that intersection is large enough to get 99% of spam, it is not worth blocking mail that some people think is not spam (and thus not getting their participation) to get at the last 1% that some people think is.

An intersection is something that all players can point at and say, "If it meets this definition, it's spam and should be throttled." There may be other things that they also think are spam, but that set would by definition be different for each player. Members are free to take extra steps on their own networks to block things they think are spam but which are not included in this intersection definition.

The definition I have arrived at that I think makes a reasonable intersection and still covers the vast majority of unwanted mail is as follows.

It is a mass mailing, where some party has ordered that a quantity of mails (not necessarily the same) be sent to a group of people.
The recipients have never had voluntary communication with the sender.

This definition is simple and factual, and not based on the content of the mail. Yet it covers almost every junk e-mail I have received, with one special exception to be noted below. It is worth pointing out that a form letter that changes for each recipient is still a "mass mailing."

It's also fairly easy to detect. I don't know if I've ever gotten a form letter that I didn't know was a form letter. Most people can instantly tell a message that was meant for them vs. one that's part of a mass mailing. Since the accountability is based on human systems and not technological ones, human complaints work fine as a basis of the system.

(I'll add one additional note, though I think it is implied. If a user consents to mail from parties they don't know, ie. they knowingly and voluntarily offer their E-mail address to somebody for resale to others, that mail is not spam. We should not interfere with any mail between truly consenting parties.)

Mailings from companies you know

The one type of mail this definition allows that will be controversial is mail from companies you have done business with. Currently the volume of this is quite small, but it could grow. Nonetheless, it is my belief we have no business interfering with or filtering mail between two parties that know one another, even if one is annoying the other. I think the market, harassment law and personal filters are the best places to take care of that.

However, one additional rule can solve this problem. First of all, I have noticed that reputable companies that do mailings to their customers always put in a "remove me from the list" system that is reasonably easy to use and which, unlike the ones claimed by ordinary spammers, actually works. You only deal with so many companies in your life. If they can only send you one mail before you require removal, the volume can never get high enough to be a big pain. In fact it would be much lower than the paper junk mail volume. I think it's reasonable that the system require people doing mass mailings to recipients that know them include an easy, working "remove me" mechanism. If they don't do that, their activities could be promoted to being called "spam."

List sharing would not be allowed, since somebody "buying a name" would then be mailing people who don't know them. It's worth noting that many people report that reputable companies seem to be taking this to heart. I give a different address out every time I give my E-mail on a web form, and only once have I seen one leak, and that was to just one party. Others report the same thing.

It's also possible (in a limited way) to devise a global opt-out list for such corporate mailings, though there are some problems that stop this from being a perfect solution. In addition, companies still need to mail actual customers about, transactions, recalls, security fixes etc. even if they have opted out in this fashion, which makes this complex.

Why going after volume is the answer

Spam is a phenomenon that results from how cheap bulk E-mail is. Many anti-spam efforts attack not this route cause of spam, but correlated symptoms if you will. Filters effectively say, "Messages that mention Viagra are spam, thus block messages that mention Viagra." Blacklists say, "This address sent some spam, so block all the mail from that address." Even my own challenge/response sytem said "Spam is from strangers, so challenge mail from strangers" though at least it did not block it. Some proposed laws say, "Most spam is advertising, so ban advertising."

But there can be no spam without high volume. Not all high volume mail is spam of course, there are many legit mailing lists as discussed above. But it's aiming at the right target.