Signed -- Digital Signature of Article

Brad Templeton's
USENET-Format Pages

Note: Currently work is being done on an alternative to this, based on SPKI. See my SPKI for USENET draft. Another option would be Keynote, (AKA RFC 2704) which seems a bit more flexible.

This header follows the MIME parameterized header syntax, with the addition of flag (+ and -) keywords. FWS may be inserted between tokens, it is to be ignored.

Signed-Header	=	System 1*[ ';' Parameter ] 
Parameter	=	Keyword '=' Value | 
			+Keyword | 
			-Keyword 
Value		=	1*[Any except whitespace, '(', ';' or '"'] | 
			quoted-string 
Keyword		=	1*Alphanum

For standard system "U", the following parameters are defined.

Key: Takes a value, which is the name of the key used to sign this article. The keyname may also be a macro. (See below)
Sig: Takes a value, which is the signature, encoded as a series of base64 characters. FWS may be inserted, it is ignored.

Signed

The signed header provides a digital signature for other parts of the article, notably the body and other header items. While there may be several signature algorithms, this specification describes a standard method, known by the name "U". Other systems may be created using a different signing "System."

Systems SHOULD have the ability to be configured to insist that an article be signed, at least the body and the minimal header set, or they MAY reject the article. This ability SHOULD be configurable on a newsgroup, hierarchy or subnet basis.

(It shouuld be noted that a decision to accept only signed articles is not a decision to bar anonymity. Parties may decide to sign messages for anonymous parties and hide their identity. The decision as to whether to permit, encourage or deny identity hiding in a newsgroup is a policy decision independent of the decision to require signing.)

Articles that are signed MUST have a valid signature. Otherwise they SHOULD be rejected and not forwarded.

Articles which perform special actions, such as most control messages, along with Supersedes/Replaces messages, SHOULD have a signature. Systems SHOULD reject such articles if they do not have a signature if they desire security on the functions performed by those messages.

Note: The digital signature should not be confused with the personal signature, a short piece of text appended to the end of articles containing human-readable identifying remarks and information.

Overview

The calculation of a signature involves first generating a "hash" based on the octets being signed. How this hashing is performed and what octets are fed into the system is affected by the various options named on the header.

By default, the hashing is done using the algorithm known as the secure hashing algorithm or "SHA-1" described in FIPS-180-1. This generates a 160 bit number.

This 160 bit number is then signed using a digital signature algorithm. The default is the Digital Signature Algorithm (DSA). (Note that pending a free licence from RSA data security, the defualt algorithm may be switched to RSA which is faster to verify.) DSA is described in FIPS-186.

The DSA involves the use of a key, typically between 512 and 1024 bits in length. Users SHOULD use a key long enough to adequately protect their scurity, not not significantly longer. For example, while it may be possible for a national intelligence agency with a dedicated massively parallel computer system to crack a 512 bit key with a few weeks of CPU time and thus forge a USENET message, users may judge this an unlikely risk and use a key with this level of security. Highly privileged operations, like newsgroup removal, should use longer keys.

Under DSA, the resulting signature is two 160 bit numbers. These are to be encoded into MIME base 64 to generate two 27 character strings. These two strings, with a comma between them, represent the digital signature. The number should be representated little-endian, with the least significant bits first and the most significant bits last.

Hashing

Key to the calculation of the signature is the calculation of the hash. The SHA applies to a stream of octets. Only some of the headers are hashed. This set is known as the "hashing set."

Multipart/Signed

The hashing stream defined below is intended to be translatable, without invalidating the signature, between this format and a variant using the MIME "multipart/signed" Content-type. Ideally, this will involve a new variant of the multipart/signed type which can take more than 2 components. This will allow gateway in and out of E-mail systems that only understand multipart/signed.

Hash Stream

By default, for an article with a text content-type, that stream consists first of:

The MIME headers in the hashing set that pertain to the article body, in general all but the MIME-Version, converted to header-canonical-form, and sorted in byte order as described below. If there are no MIME headers, and the article is of type text/plain, this section is blank. All these headers MUST be in the hashing set. A MIME header that applies to the body which is not in the hashing set is an error. Note that if the Content-type is "text/plain" it SHOULD NOT be expressed, as this is the default under MIME.
A blank line
The exact body of the article
A trailing newline, a "boundary" string (see below) with trailing newline,
All the header lines in the "hashing set", except those entered in the MIME portion above, with the header items converted to header-canonical-form, and then sorted in byte order from lowest to highest and concatenated.

The hashing set by default is empty. Various options can add headers to the hashing set, including commonly all the headers in the article.

Options may add a set of predefined headers to the hashing set, though some of those headers are not present in the article. In this case these headers MUST NOT be hashed. (ie. don't invent a virtual header with no field.)

A variety of options can alter the hashing set.

Header-canonical-form

Before hashing, headers are canonicalized as follows:

All folding, consisting of trailing whitespace, a newline and leading whitespace on the continuation line, is mapped to a single space.
Any trailing whitespace prior to the final newline is removed, though the final newline is preserved as the last character. Any carraige-return (CR) characters are stripped.
Exactly one space is placed between the colon after the header name, and the first non-whitespace character of the header.

Boundary String

The magic "boundary" string included in the hash allows this signature format to be mapped into the MIME "multipart/signed" format described in RFCxxxx. The boundary string consists of the characters "==" and a specially chosen string <B> such that neither the string "n==<b>n" nor "n--<b>n" occur within the body of the article.

This boundary will thus be suitable for use as a MIME multipart/signed boundary, and as a boundary between the body and headers in the signed part of the the multipart/signed. The string "==<b>" where <b> is the boundary, is the string added to the hashing stream after the body, newline and before the newline and signed headers.

Modifications to Multipart/signed

Draft modifications are being considered for the Multipart/Signed MIME Content-Type to allow support for more than the 2 parts defined in the original specification. In particular, there would be support for M+N parts, where there are M parts to sign and N signature parts, though N would commonly remain equal to 1.

In a USENET article, 3 parts would be used. The first part would be the body of the article, with its MIME body headers. The second part would be a duplication of the USENET headers that are in the hashing set, excepting the MIME body headers. The third part would be the signature section, with Signed and Cert headers.

Currently an article signed by two or more parties where the later parties added other headers is problematical for conversion to multipart/signed. The conformant method, including a multipart/signed within a multipart/signed, is probably unacceptable for USENET. Using more than one part in the signed blocks for headers is one alternative but adds complexity as well.

Non-Text articles

The 3-part multipart/signed supports the handling of articles that are not of a text type. If this variant is not available, articles must be built by creating an additional 2-part multipart/related within the signed component. The first part containing the original article (which may itself have MIME components) and the second part being a text/plain component featuring the headers to be signed. More on this later.

Body

Normally the body is hashed as a raw stream of octets. This is necessary to avoid complex processing requirements on bodies just to check signatures.

While it would be ideal to hash in canonical form, this places a large burden on signature-checking software. The +WhiteCollapse option allows the elimination of excess whitespace.

Multiple Signatures

There can be multiple signatures on an article. If a party wishes to sign an article that has already been signed, but wishes to add additional headers to the article, it MUST provide a "level" number using the "+#" option described below that is higher than the level of any other sinature currently on the article. In addition, it must modify the signature headers of all other signatures on the article, altering their "Options" field by adding a "--" option with the name, or unique prefix, of the header that was added. This will prevent other systems from including these headers in their hash to test that particular signature.

Because the Signature headers are themselves not signed, it is possible to modify their options. At most, improper modification of options results in the signature no longer matching the article it signs, so this is not a security hole.

If a header is already present and signed, a later signer MUST not add another header of the same type.

It should be noted that because other signature headers are not signed by a later signer, that signer or any other party can remove one of the signatures and still have a signed article, however the article will no longer be associated with the trust assigned in certificates associated with the removed signer.

Options

The following options apply in the "U" system. Typically "+" options turn a feature on, "-" options turn it off. Options are typically specified in their non-default mode. Options are named below in mixed case. They can be specified in either case by providing as a minimum the letters listed in upper case.

-Body: Do not hash the body, only headers (+b is the default)
+Variant: Do hash variant headers, such as XRef and V-*. Likely to be used only in internal systems where these headers are not actually variant.
+Auth: Do hash signing headers (Signed and Cert). Only lower level Signed headers can be hashed, obviously the new signature itself can't be included in the hash before it is calculated.
-Sort: Don't Sort headers (based on full text) lowest octet-order to highest, before hashing. To be used if transports will never alter the order of headers. (+S is default)
++<header-code>: Hash any header that begins with the specified header code. If the code ends in a colon, hash headers that exactly match the code.
--<header-code>: Exclude from the hash any header that begins with the specified header coe. If it ends in a colon, hash headers that exactly match the code.
+HashAll: Add all headers in this article, except those on the standard exception list named below. Later options may remove headers from the hash set. It is important to note that software written to earlier standards may add headers to an article without performing necessary operations to change the hashing set. As such, this software, if used with +HashAll, will invalidate the signature and thus cause probable non-propagation of the article.
+HashBasic: Start with the following hashing set: From, Subject, Newsgroups, Distribution, Date, Message-ID, Reply-to, Control, Supersedes, Replaces, Lines, References, Content-type, Mime-version, Followup-to. (In general every poster-generated header that is not usually added by older injection or moderation software.)
+HashSigned: Do hash the "Signed" header which contains this option. Before hashing it, however, the signature component (all characters after the final semicolon up to the ending newline) MUST be removed from the hash stream (for obvious reasons!). This makes hash options inviolate, and means others systems will be unable to alter them. As such, it may cause the article to be unable to pass through certain systems, and is risky.
+HashOrg: Add "Origanization" to the hashing set.
-HashMsgid: Remove "message-id" from the hashing set
+#<integer>: If there are multiple signatures, and any headers are added with such new signatures, they must be ordered, so it can be known in which order they were added. Any system which adds new headers that would be signed by an earlier signature must provide this argument with a greater number than that in any existing signature. Default level number is zero.
+WhiteCollapse: When hashing both header and body, collapse strings of whitespace (space, tab, newline, CR) to a single space if they do not contain a newline, or to a single newline if they contain a newline. (Newline is the LF character, ASCII value 10 decimal)
BoundaryString="<bstring>": The specified string is a string chosen such that neither the sequence "<newline> '-' '-'<bstring><newline>" (nor "<newline> '=' '=' <bstring><newline>" without 3-part multipart/signed) appears in the body of the article, in the transfer-encoding used when it is signed. It will become a suitable MIME boundary string if the article is to become a MIME multipart component.

Standard Exception List

With the +HashAll option or any other options which add headers implicitly to the Hash set, the following headers are not to be included.

Xref -- though it should generally not be present except in local databases.
Path: As it varies from site to site.
Signed: Any other signature header is not included. It is not necessary to sign other signatures, their validity can be tested other ways.
Cert: Any certificate headers are not included. It is not necessary to sign or test certificates, they contain encoding of their own validity.
V-*: Any header beginning with the prefix "V-" is considered an unsigned header. Headers can have multiple prefixes, so "X-V-Foo" is considered an unsigned header.

Hashing Set behaviour

Ideally an author will generate an article with a full set of author's headers, sign these and indicate all headers not on the exclusion list are signed. If they are feeding the article to modern conformant injection or moderation software, this is all they need do. The conformant software, if it adds headers, will add them to the exclusion list of the original signer. The author's system MAY also list what is signed, but this is considerably bulkier.

However, if sending an article to an injection system or moderator of unknown state, it is necessary to not use the +HashAll option and instead to add headers to the hashing set explicitly with ++ options and probably the +HashBasic option.

Key Names

Each signature involves a public key, commonly a number between 512 and 1024 bits. Each key used in the system has a unique name. The name MUST be provided with the signature. Uniqueness of names is to be assured by certifying authorities that issue names.

Names may either follow the same syntax as an E-mail address, in which case they are identifiers simply to be looked up in a database. The system checking the signature is presumed able to retrieve the key by name.

Names may also be specified via a macro, beginning with the character '%' and an alphanumeric string. These strings may extract text from elsewhere in the article. Most notably the macro "%f" returns the E-mail address from the "From:" header of the article. It is expected that most keys tied to an E-mail address will be named based on the e-mail address of the owner.

The returned text from a macro is to be used as a key name or a component of a key name.

The key name may also be a '+' and an integer. In this case the key is to be found in a certificate elsewhere in the article. That certificate will contain the matching integer, along with the key name, and the actual key itself. Of course the certificate will also contain a certification of the key. There may be multiple certificates in an article. Each must use a different integer to identify itself for this purpose.

Finally a key name may consist of a URL in quotes. While this can be treated as just another key name that can be looked up in a local database, it is expected that the key will be available at the specified URL. (See elsewhere for a description of how keys may be fetched from such URLs.)

Key Name Rules

Signatures using a key which does not in its certificate have the "Keep" attribute, or any key not broadcast for a week prior to use MUST include a certificate for that key. Certificates SHOULD be included during the early life of any key, or when a key is used in a way that it will reach systems that have never encountered it, or not encountered in for the previous 2 months.

In order to look up "Keep" keys, sites MAY maintain a database of certified "Keep" keys they have seen, or arrange with a lookup service to allow quick lookup of keys.

Bulk-Signing

Some sites MAY elect to use a bulk-signing system if the cost of signature verification is too high. In such a system, a trusted site examines a batch of signed articles and confirms their signatures as correct. It then creates a digest. The digest lists, for each message in the batch:

The message-id
The hash, and any flags affecting the computation of the hash
Whether the signature was valid
Any optional information, such as the Newsgroups and Distribution headers

It then puts this digest into a special control message and signs the entire message.

Sites receiving digests check their validity using the signature on the entire message, confirming they come from a trusted entity. They can then store locally, indexed by message-id, the expected valid hashes for articles.

When those articles arrive (or after they arrive if they have been placed in a queue pending the arrival of a digest verifying them) they can be hashed, and the hash can be looked up via the message-id. If the article matches, it can be acccepted. If it does not, or is tagged as failing its signature check at the trusted site, it can be rejected.

If digests contain 50 articles, for example, a site needs only verify one signature to be able to verify all 50 articles. Thus the CPU load of signature verification can be made arbitrarily small -- just the cost of hashing.

It should be noted that if the digests also contain other information allowing the site to decide if it would have been fed the article, such as the Newsgroups and Distribution lines, the site MAY elect to generate, after some time, a list of valid articles which failed to arrive on site, and attempt to fetch them via some method such as NNTP, thus generating a 100% reliable feed.

See (other documents) for the full syntax and semantics of a verification digest.

Issues

This system is designed to be as compact as possible for reasonable use in USENET. As such it differs from existing E-mail certification systems. Notably, the leading alternate systems require all messages to be MIME multipart messages, with one part containing the text and signed headers of a message, and the other part the signature information. Such a requirement would make signed articles particularly unpalatable to older software, and require duplication of most headers.

Rather than allow choice, which complicates matters for implementers, it is possible that the standard should dictate just one hashing and signing algorithm, leaving open the possibility of others only for the event that the designated algorithm becomes obsolete.

If an article is to be signed by two parties (ie. author and moderator) with the second party adding new headers, then a system as described above which designates which headers are signed needs to exist. This adds considerable complexity.

It is anticipated that most articles will have just one signature, generated by a certificate-collapse server whose own key is known to and designated as all-powerful by all sites in a subnet. As such, there may be no need for complex rules about what headers to hash or the ability to sign an article twice.

There have been recent reports of potential compromise of the MD5 algorithm, so SHA-1 may be set as the only choice.

The DSA is suspect by some people because of the NSA's involvement in its creation. However, there is no evidence that it is insecure. It is also somewhat inferior to RSA in terms of compution time needed to verify. However, with modern processors the time is very short, and with the batch digest system, it can be made irrelevant.

RSA is subject to patent licence restrictions. DSA has uncertain patent questions over it, though no formal claim has been made. However, in the USA at least, the federal government has promised to indemnify all users against any patent claims that may come up. RSA coded for signature use only is not subject to export controls though one must take care to assure the code will not perform encryption or an export licence may be involved.

As the order of headers can't be assured (though it would be nice) there is no easy way for a later signer to add a header that is already on the article. That is part of the point, however. But this does prohibit headers like "Received" from mail, or the simple concept of providing a replacement header at a higher level. Hard fact of life -- you can't secure headers and then have them be rewritable, except by parties you trust to override original headers.

There are several unresolved issues here needing trial implementations.