Proper principles for Challenge/Response anti-spam systems

Back in 1997 I wrote one of the very earliest challenge/response (C/R) spam-blocking systems. (*) These are systems that, when they see a possible spam E-mail from somebody you've never corresponded with before, hold the mail and e-mail back a "challenge" to confirm that the person is a real sender and not a mailing robot, in particular a spammer. The other person gets the challenge, and responds to it in some way. If they do this properly, your system releases the mail that was held, and from then on they can mail without challenge.

There are a number of these systems springing up -- it's a very effective system and a fairly obvious idea -- but not everybody is doing it right, so I thought I would lay out some "best practices" based on my 6 years of experience. I don't even do all of these things, because I wrote my system before they became necessary, but if I were writing a new version, I would.

I've seen a number of people express anger at challenge/response systems because they don't follow these principles -- in particular because they challenge mailing list mail and annoy list managers and list participants. However, these are bugs in those programs, not a flaw in the C/R concept.

In answer to the other, non-bug issues, I have also prepared an essay on whether challenge/response anti-spam systems are good or bad.

C/R is in fact a likely companion to most anti-spam systems. Almost all anti-spam systems have "false positives" -- they mistake real mail for spam and block it. One way to address that is to challenge mail that might be spam rather than blocking it. Ideally mail you know not to be spam should go right through. For efficiency, mail you are sure is spam can be rejected. But mail in the intermediate zone is a good candidate for C/R. After all, wouldn't you rather get a challenge than to have your mail dropped on the floor by somebody's anti-spam system?

Executive Summary

A challenge response system should:

Be combined with other anti-spam systems, so that challenges are sent only to new correspondents sending mail that can not be reliably classifed as either surely spam or surely not spam. Ideally that's a very small fraction of your incoming mail.
Include an Auto-Submitted header.
Include an In-Reply-to header with the message-id of the source message.
Never challenge replies to your mail, even if they come from a different address than your mail was sent to.
Never, ever challenge list mail, other automated mail, errors, or other challenges.
Avoid challenging replies even to public messages you (the user) send to mailing lists and newsgroups.
Offer unfiltered addresses that are not public and are never challenged, along with public addresses that are subject to more filtering and challenges.
Look for loops.
Include diagnostic information to help track forgers and bugs.
Summarize blocked mail for the user every so often.
Make the challenge as easy as possible.
Whitelist mailing lists as automatically as possible.
Don't challenge mail with binary attachments or other likely viruses.

New Principles for Autoresponders

All autoresponders, including C/R systems, are coming under fire because they autorespond to spams and viruses that contain forged From headers, reflecting mail onto 3rd parties. I have written an essay outlining some new proposals for autoresponders to help solve these problems. Everything in there, at least once standardized, should be done by any C/R system. In particular, C/Rs should follow the RFC3834 rule to add the Auto-Submitted header, the In-Reply-To header and the suggestion about Received lines.

Never challenge any mail that's a reply to a private message you sent.

If you send somebody private mail (from any address you have), and they reply to you with any mailer, you should accept their mail and not send them a challenge. This is true even if they reply from a different address than you sent the mail to. Many people have mail aliases, and receive mail on one address and send on another. Some people use other anti-spam systems that generate new addresses every time they mail.

What this means is that simply whitelisting all addresses you mail to is not nearly enough, though it is of course an important thing to do.

One of the easiest ways to do this, by the way, is to have multiple addresses yourself. Send out private mail with an address that does not do challenges. An old fashioned unfiltered E-mail box (though you may want to note the addresses on incoming mail to whitelist them.) However, you must be sure this address won't get out to spammers, or you will have to switch it to another. (You must be prepared to do this.)

In general, you should probably put an un-challenged address on business cards too. Save filtered addresses for posting to the net, including postings to mailing lists, listings on web pages, listings in conference directories, etc.

C/R must be combined with other anti-spam algorithms

Due to the risk of responding to mail with forged return addresses and annoying innocent parties, a C/R system should be combined with other spam-detection algorithms so that mail which can be reliably tagged as certainly spam or virus is never challenged.

Mail with very low spam scores should just be delivered without challenge. Mail with extremely high spam scores (certain to be spam) can be discarded, and this may be necessary for efficiency. Challenges should primarily be used on mail from unknowns with more moderate spam scores, whose status can't be decided with sufficient certainty.

C/R should be your last line of defence against spam, to give a chance to new senders who might otherwise have their mail blocked by errors in your first-line defences.

Avoid challenging replies to public messages

If you can do it, avoid challenging replies to your public messages to mailing lists and newsgroups. With private mailing lists (not archived in public) you can of course accept any replies with reasonable safety based on subject line and in-reply-to. With public postings, consider accepting replies unchallenged for a few days to weeks after postings, then add a challenge for late replies which are more likely to be spammers.

Use multiple addresses

Any good spam filtering system will support giving the user multiple aliases under which to receive mail. This has two functions. One, you can filter some aliases more than others. For example, you might have "public" addresses used in newsgroup postings and on a web site, and private addresses used only in mails to private parties, replies etc. You would use less filtering, and perhaps no challenge/response, on private addresses.

It's also handy to provide a gamut of addresses to use so that you can use a different address every time you give out an address. For example, if entering data on a web page that asks for your E-mail address, use a different one each time. That way if any address gets on spammer's lists, you can delete it or give it very high spam filtering with minimal risk to mail from others.

The best plan is to have your own subdomain for mail, allowing an infinite space of addresses. However, if that is not available to you, sendmail treats mail to "userid+anything" as mail to the given userid. For example, if a sendmail user has the address fbaggins@shire.org, then fbaggins+ring@shire.org and fbaggins+bagend@shire.org and all other such addresses will be delivered to the main address. Qmail does a similar system, using a dash instead of a plus. That's better, since unfortunately there are huge numbers of badly coded web forms that, because they map "+" to a space, don't accept fully legal e-mail addresses with a plus in them.

The personal domain is also best because spammers can easily guess the root address on a plus-sign based address. If you use this, you must have a filter on the base address, and have unfiltered addresses use the plus.

Some systems generate a new address for every mail sent, using a special random string in the address itself. Some use a cryptographically secure hash to generate the string so they can immediately identify any address they identified without having to remember them.

Be aware, however, that in generating many addresses, you may mesh badly with other whitelist systems expecting your mail to come from the same address. One option is to use the base address in the "From:" and put any generated address, especially an unfiltered one, in the Reply-to. Beware that there are mailers that botch Reply-to out there.

Never challenge mailing list mail

For decades, all good mail responders (including Vacation programs) have known not to respond to mailing list mail. An unofficial standard has indicated that bulk mail of various forms would have a header like "Precedence: bulk" or sometimes "Precedence: list" to mark it as bulk. "Precedence: junk" is rarely used for it would declare things to be spam!

You can also test to see that none of the addresses in the "To" and "CC" lines is an address for the person getting the mail, though that does present a maintenance problem since there is no automatic way to know all those addresses. However, you definitely should not challenge any mail with the above precedence headers.

Who to challenge?

There are three possible addresses you can challenge. They are the "Envelope From", the regular "From:" and the address in a "Reply-to:" header.

Most merit points to challenging the Envelope From, which is the address you would send bounce errors to. The "From:" is the person who wrote the message (and thus in most cases, though not all, the person you are trying to confirm is a human being.) The "Reply-to" is the address that the sender expected actual replies to the mail to go.

Unfortunately, you definitely should not challenge more than one of these.

A challenge is similar to a bounce error, but unfortuantely in many cases it is not handled by a human -- it was in fact designed to be not handled by a human. Most such cases are list mail, which you should not be challenging at all. In the case of list mail, the Envelope-From always identifies the list manager itself, not the particular poster to the mailing list. Sometimes it is a unique address, so that programs can automate detecting bounces without having to parse them to try to figure out what mail bounced.

The From is often the actual person who posted to the mailing list, or the real sender of a person to person mail. Some lists have all mail come "from" the list manager, however. Some lists have the list address be in the Reply-to.

You must not challenge individual mailers to a list, so only challenge the From or Reply-to when you are sure it is not list mail. If you challenge individual mailers you'll get bounced of the list very quickly.

The answer here thus depends on how good your detection of list mail is. If it's reliable, you may decide to challenge the From or Reply-to, since that is more assured to be a human. On the other hand, challenging the Envelope-From has many merits. The worst case is that it's not a human (or it's a list that is not tagging list mail as such) and this mail will appear in the digest, hopefully near the top.

This is complicated by the many web-based systems that send mail on your behalf. (ie. "Email this page to a friend" and so on.) If you challenge the envelope from, you will probably challenge the mailing system, which probably won't answer.

(Though they could use a unique address there to know to forward challenges and bounces to the user, and some of them already do this.)

Never challenge a challenge!

The other person might have a C/R program or a whitelist.

Make the "From" on your challenge match the address mailed to

When they send out their mail they will have whitelisted the address they sent to, so any challenge From that address should get through.

Since a challenge will share many qualities with a reply, including an In-Reply-To header, all the steps you take to avoid challenging replies will assure you don't challenge a challenge.

Don't challenge errors

You must not challenge error messages or other robot-sent mail. It would be nice if all such mail were tagged appropriately. Errors on messages you sent out are very similar to replies to messages you sent out (and, like replies, they may come from addresses other than the one you mailed, such as mailer-daemon.) The same heuristics apply for detecting errors on your own messages.

Errors that are not replies to your messages should not be challenged, however they should also probably not be let through. They can be grouped in a special area of the summary. These can be errors generated from messages where somebody forged your return address, for example.

Deal with multiple messages well

If you get multiple messages from the same sender (particularly over a short period of time) don't send identical challenges. You may consider sending only one challenge and not sending another for at least a day. Any additional challenge should contain text to indicate the system knows it is challenging again, and should contain a summary of the currently held e-mails awaiting release.

However, two similar messages from different senders should get two challenges. However the latter copies can include a challenge that says, "by the way, I already got a copy of that message from N other senders, so to get on my whitelist without delivering that message, here's an alternate procedure."

As a failsafe, look for loops

While you are doing everything to avoid challenging challenges or list mail or bots, expect other software to be broken. Use your magic tokens to detect possible loops and stop them. In particular, if you detect any sudden high volume of mail with similarities, consider holding challenges on such mail and putting them in a special summary digest.

Include traceback info in the challenge

The challenge should embed (perhaps in an unobtrusive place) traceback information on the message being challenged, including its envelope From, and possibly its Received headers. This is because if you respond to a message with a forged From address (ie. from a virus) in spite of your efforts not to do so, you can help the other person figure out who did the forgery with this data.

Put an In-reply-to header on your challenge

The challenge should refer to the message-id of the mail being challenged. A good whitelist program should remember the message-id of every mail the user sends out, and every challenge sent out. If a challenge comes back with an in-reply-to, you can identify it as a valid challenge. In the end, this may become the main technique, once spammers try to guess the names of your friends and send spam disguised as challenges. They can't fake this message-id.

The other reason to record the outgoing message-id is to be sure you never challenge anybody replying to mail you sent out. If mail has an in-reply-to that matches an outgoing message-id of a private mail of yours, you let it in.

Include the subject of the original message in the challenge

C/R programs should also log outgoing subjects, so that they can detect replies (and challenges) to the user's messages.

Present a regular summary of all blocked mail

No system is perfect, so the system must present a summary on some reasonable interval, of mail that was blocked by the system. This would include mailing list mail that was unchallenged, and mail to which the challenge was never responded.

This should be presented as a summary digest, which allows a quick scan of all these messages. The summary should show a minimal set of relevant headers (From, To, Subject, CC etc.) and a few lines from the body. It should also show a "spam score" calculated for the message, and the digest should be sorted by spam-score, so the lowest scores appear at the top.

With each message in the digest, the user should be able to select the message to define what to do with it, including delivering it, whitelisting the sender, whitelisting the mailing list it came from, and combinations. It can also offer options like blacklisting the sender, tuning the spam-score, and reporting the spam to collaborative filters.

Any existing spam scoring system can be used. The fact that the challenged address did not exist or the mail to it bounced may give a high spam-score, but one should be wary of the affect of this on anonymous mail.

The summary can be e-mailed every so often (once a day typically, or less frequently for people who read mail less frequently) or a web option should be available to see the latest summary. Normally messages would not appear in the summary until they have had some period of time to get a response to the challenge -- typically a daily digest will have the prior day's messages in it.

This step is vital. If this is not done, users will miss mail for mailing lists they joined, mail from people who decide not to answer challenges, and mail from people whose mail software is incompatible with the challenge.

Understand mail/postings to public vs. private addresses

As noted, the best practice is to use an address that does not have C/R on mail to private parties. It is important however to use a C/R filtered address if the mail/posting will go out in public. This includes all newsgroup postings, and any mail to mailing lists which have public archives. An ideal system would modify outgoing mail, using a non-filtered address on private mail, and a public address on mail that may be exposed in public.

Make the challenge as easy as you can until spammers automate it

Spammers are not currently trying to automate fake responses to spam challenges, but they will. Until they do, asking for any reply at all actually works well as a challenge. Once they do, challenges must require some special action from the responder, something to prove they are human. Even so, try to make it as easy as possible, and provide several means of responding to the challenge.

For example, send your challenge as plain text, or a a multipart/alternative with plain text and HTML. In both, include a link the user can click on to make their response via a browser. However, since many people read mail offline or without a browser handy, always allow the response to come in E-mail.

Don't require the user to be online to see the challenge, ie. don't use inlined image files unless absolutely necessary. If you use them have an alternative. When I'm on the road, my practice is often a quick mail sync-up at a wireless hotspot with no time online while reading the mail.

While the challenge must come "From:" the address that was mailed, it can have a Reply-to that sends the response to a specific handler with a unique address that lets you know what challenge is being answered. Since some users will not deal properly with the Reply-to, it is advised you also detect responses at the address which was in the From: of the challenge. In your challenge, put a magic token in the Subject line, Message-id and body, and if that token appears in any part of the response -- Subject, In-Reply-To or body, you will be able to identify the response, no matter what address it comes from.

If you ask the user to answer a question, be as forgiving as possible when finding it in the body or subject of the response. If the user makes a bad response, give them an error to know their mail is not yet delivered.

Be sure the visually impaired and non-English-speaking can respond to your challenge

Don't make the only way to respond to your challenge be a graphic with text in it that the user must type in. This blocks all access by the blind. If you use this method (which is frankly overkill at this point) you must provide an alternate method (for example an audio file that reads the string they must type in.) But again, this method is overkill, no spammer is even remotely close to trying to guess the answers to even the simplest challenge questions as yet, and there is no need to put people through hoops before they do, if they ever do.

If writing general software, be sure the challenge is understandable in all languages that might be used by people writing the receipient. Thus if your recipient is bilingual, the challenge should be.

Don't force users to re-send mail

Some challenges indicate the original mail was not delivered, and ask the user to send it again. Users will balk at this, and if they felt they were doing the recipient a favour (such as answering a question they asked in a public forum) they often will not bother to jump through any hoops to respond to challenges or re-send mail. You must make it as easy as possible.

Detect all attempts to subscribe to mailing lists

Watch outgoing mail and look for any attempts by the user to subscribe to a mailing list. This includes mail to "-subscribe" or "-request" addresses especially with "subscribe" in the subject or at the start of a line in the body. Try to understand the subscribe requests of most major mailing list systems, such as majordomo, listserv, topica, yahoo egroups, etc.

When the user subscribes to a list, you need to identify the list and whitelist it.

You can subscribe to lists via the web, though many then do a 2nd confirmation of the subscribe -- usually also by web -- which you may be able to look for. You must also avoid challenging these confirmations, even though they will not come with a Precedence bulk. In some cases users may have to avoid signing up for lists via the web without telling the C/R system.

Detect mailing lists subscribed to in the user's mail archives

Most C/R systems do a pre-scan of the user's archived mail folders, outgoing and incoming, as well as address books, to whitelist all proper correspondents in advance. Detect the presence of mailing lists in these archives to whitelist them in advance. You can't challenge mailing list mail so this is important. You will need to extract the Envelope From, as opposed to the "From:" header, in many cases, to properly spot mailing lists. Of course, you must avoid scanning spam to avoid whitelisting it.

Detect patterns of possible incoming mailing lists

Fortunately most spammers don't actually maintain real mailing lists that send multiple mailings to a user with the same Envelope From, and they don't use Precedence headers. You should, however, look for patterns in these headers on incoming list mail. (List mail to be identified by Precedence header and lack of the user's address in To/Cc headers.)

For example, if you get a sudden surge of messages, all with the same Envelope-From for the target user, this may be a mailing list the user has subscribed to. This is especially true if the messages have low spam scores.

In this event, consider placing a special note at the top of the digest summary, or in a special message, saying something like, "You have recently received 6 mailing list messages from a list identifying itself as XYZ" and provide a means to say they wish to whitelist the list or perhaps blacklist it. If they whitelist it, deliver the mail. Give them a way to examine the potential list mail.

This is needed because you won't catch every mailing list subscription they do. Especially since in many cases you can subscribe to lists via the web.

Be warned however, that some mailing list managers put magic tokens in the envelope-from, to more easily track bounces. However many popular list managers also put in special "list" headers that help you identify the list. This includes headers like List-ID, and a "Sender" header.

Avoid challenging virus mail and other forged mail

There are some annoying virus programs out there that breed by sending mail to one party with a forged "from" of another party. When you get such mail, you will send a challenge to the poor oblivious party who got forged. This is a very hard problem to solve, unless your program is also a virus detector.

There is some argument that nobody should be sending an executable program to somebody they have never mailed before, and you can reject, unchallenged, any such mail. This would be a burden for those who change their E-mail address. Though in general, viruses have severely limited our ability to send legitimate executables, and it may not be much of a burden to suggest you must first establish yourself on a whitelist before doing that.

Include tracebacks in challenges

If you send a challenge, include in it the traceback information you have on the source of the original, such as Received lines (with server IP addresses) in the original mail. This allows tracking of forgers. It is not a privacy violation, since in theory you are sending back the traceback to the person it identifies.

Make use of other authentication tools

To attain the above, consider the use of other authentication tools that confirm the mail is from a trusted person. For example, if mail is signed by a party you trust, that should be sent directly through.

Likewise, if the mail appears to be forged, you may want to avoid challenging it to avoid sending challenges to innocent parties who never sent you mail at all. However, this runs the risk of not offering a challenge on mail sent by a person from a system that is not their home, or from an application which sends mail on their behalf.

Most of the forging tests are used in other spam detection algorithms and so apply to the next rule.

Protect user privacy

A whitelist system builds a record of all correspondents. This is common with many email clients, who do such to build an address book. However, some C/R systems are operated on 3rd party servers out of the control of the user, putting this address book out in a more vulnerable place, whose security is not under the user's control.

In either case, consider privacy protection in a design. For example, a whitelist can be stored as hashes of an address, making it harder (though not completely impossible in most cases) to extract the list of real email correspondents. However, if the hash is made so that it is small (and thus will on rare occasions let through a spammer) it can no longer be used to reliably build a correspondent's list since any large dictionary attack will show several matches.

Provide fast response

Challenges should come immediately. Ie. if you wait for a client to download mail from a POP server before challenging, the sender may have gone away and not come back for a while to see the challenge, delaying mail.

This goal seems to demand supporting C/R on the server, which is hard to reconcile with the privacy problems fixed by supporting it on the client. An always-on client (such as a desktop) can solve these problems.

Think about anonymous E-mail

Anonymous E-mail is still a useful thing. In part, you allow it by providing the daily digest of mail that was unresponded, with low spam scores coming first. Of course two-way remailers let you send a challenge and get a response by E-mail. If you insist on response by web you make it a little harder. Offering both lets the anonymous mailer select the best way to protect her identity.

Other systems (e-stamps etc.) which may not work on their own can have application to allow anon mailers to get through C/R systems.

Spammers may try to fake the things you detect

Spammers will eventually try to fake out all things you look for in order to avoid challenging or filtering e-mail. However, they will not do this right away. Since all things you do that make it harder for mail to get in will increase your risk of blocking desired mail, don't apply any stricter test until it actually becomes necessary.

Among the tests I have listed here, risks exist in the following areas.

Spammers will eventually try to guess what mailing lists you are on, or what correspondents you have whitelisted, and they will forge mail to appear like that. This is especially true with any publicly archived mailing list you post to. Lists will eventually need digital signatures if this attack becomes common.
If you allow replies to your messages to come in based on subject, then spammers will form replies to your public messages. To avoid this, you may wish to allow unchallenged replies only for a limited time on public messages.
Try to be liberal at first, and only close down when spammers abuse the liberty. Don't try to prevent something that's not yet happening if it has a risk of blocking legitimate mail.

C/R may, over time, lose its utility if most spammers try to target it directly. However, it still has several years of life. It should also be combined with other techniques. For example, if you have a good spam filter, you might decide to challenge only messages with high spam scores or other reasons to suspect they are spam, and let through other mail.

What can't be done

Some worry that C/R systems can be used to attack innocent parties. The evil party sends out mail with a forged return address (as often happens already) and the C/R systems all challenge it, mailing some poor victim.

This is indeed possible, but this is a flaw with all mail autoresponders, including the highly popular "Vacation" programs that mail people when you are away, to mailing list subscription engines and many others. In fact, most SMTP servers out there can be made to respond with a "bounce" to a faked address. As such it is not a specific flaw in C/R.

C/R systems do put a burden on new correspondents. However, it's quite minor if done right (and can in fact become invisible in time if they get new software that has another way to show their mail is not spam or to auto-respond to cpu-based challenges.) It only happens when you first mail somebody at their public address (ie. one you get from a directory or web page, not from a business card.) That sort of mail is fairly rare.

Mailing lists must be whitelisted, which is an extra burden, though a lot of this can be automated if the list was subscribed to via outgoing mail, or if lists start signing their mail.

For the future

Most C/R systems try to send a challenge that is a "turing test," which is a test able to tell the difference between a human and a computer. As such, by definition, you can't automate the response to a turing test.

A common test used these days is to include text as a graphic that's hard to OCR, and require the other person to type in the text using their skilled letter-finding human eyes and brain. That's certainly overkill today, thought might not be in the future.

It is possible to do an automated test though. One way would be to standardize, in the challenge, the expression of a cryptographic problem which is known to take a lot of CPU to solve. For example, finding a number which hashes to a provided number can only be solved with brute force, and thus must take a lot of CPU to solve. With a secure hash function, the challenge might offer, as an option (instead of the turing test) a request to provide a the answer to such a problem.

A smart mail client, seeing this challenge could solve the problem using background CPU. CPU on most client workstations is freely available and so if the problem takes 20 seconds of CPU, it could be solved in 2 minutes without placing much burden on the workstation. The answer could be sent automatically, with the user unaware this even took place. 2 minutes later, the mail is delivered.

Spammers, though, could not afford to spend 20 seconds of CPU time for each mail they send. They have better things to do with their CPU.

Even people without a special mailer could be offered a link to a Java applet which would solve the problem and report the result to free up the held mail. Thus with a single click, again the user could answer the challenge if they have a Java enabled web browser.

I have a longer essay on the CPU stamp concept.

The lynchpin

As noted above, C/R will form the lynchpin in any anti-spam system. Such systems will classify mail as certainly spam (for rejection), sufficiently certainly real mail (for immediate pass through) and of unknown state.

The mail of unknown state should get a challenge. If the rest of the anti-spam system is good, the number of challenges sent to real mailers will be few, and the burden on mailers very small. The spam that gets through, however, would be very low, and the real mail that's blocked as close to nil as possible.

* Marshall Rose, of the IETF, reports writing a C/R system even earlier, though he only ran it for a limited period of time. As such, as far as I know, I've been running one of these for longer than anybody, and used those experiences to prepare this document.