Path -- Article Path

Brad Templeton Internal Page

The Path header shows the route a message took from its entry into the USENET system to the current system. It is a list of site identifiers with the origin on the right.

Form of Path:

PATH-ENTRY	=	old_path / new_path 

old_id		=	1*( ALPHA / digit / '-' | '.' | '_') 
old_path	=	old_id *(punctuation old_id) 
punctuation	=	LWSP / %x21-2f / %x3a-40 / %x5b-60 / %x7b-7f 
new_delims	=	[FWS] ('@' / '/' / ',' ) [FWS] 
new_path	=	post_injection '%' pre_injection 
delim_plus_id	=	'!' [FWS] old_id 
			/ new_delims site_id 

post_injection	=	*(site_id 1*punctuation) site_id 
pre_injection	=	site_id *delim_plus_id 
site_id		=	ALPHA word 		; UUCP name 
			/ ALPHA			; for "x" tail entry 
			/ '.' word		; other registered name 
			/ <FQDN>		; as per RFC 1034 
			/ <dotted-quad>		; numeric IP address rep 
						; specified in rfc820 etc. 
			/ '[ dotted-quad ']'  
			/ '[' <ipv6-numeric> ']' ; per RFC1884 
word		=	1*(ALPHA / digit / '-' / '_')

When a system receives a message from another system, it MUST add its own unique name (path-identity) and a delimiter to the beginning of the Path string. In addition, if needed, folding-whitespace may be added.

The path-identity added MUST be unique over all of USENET. To this end it should be one of:

A name registered previously in the UUCP maps database (found in the newsgroup comp.mail.maps), containing no dot character. The UUCP name was often used as the path-identity in the early days of USENET. Use of this registry is less common today. Entries should be at least two characters long.
A name registered in some other database committed to keep its names unique from all other databases and all other encodings in this list. Any such database must define a form that can be listed in a web-based appendix to this document, so that the database type can be discerned from the entry. It is recommended that such entries begin with a dot, and then a unique alphabetic string allocated by a central registry, so that they can be told from FQDNs immediately. The string ".A" is reserved.
The fully qualified domain name, retrievable in the Internet DNS service, of the host. For sites not on the internet this can be the FQDN of an MX record for their mail server, providing sites that receive articles from this site have some way to verify that MX-only FQDN.
An encoding of an IP address -- dotted quad or for IPv6 as per RFC1884. Encodings using SHOULD NOT be used prior to Verified-Path-date.

Whichever form is chosen, a site SHOULD use a form which can be verified using one of the schemes described below by all sites to which it will forward news articles. If all forwarding is by NNTP or other internet based protocols, then the FQDN or IP address encodings are advised. For the purposes of comparison, FQDN entries should be put in an all-lower-case canonical form.

Because RFC1036 specified any punctuation or whitespace could act as delimiter, programs SHOULD accept this, with the exception that IPv6 addresses containing colons MUST be treated as a single unit. Modern programs MUST generate only the set "!,%@" plus optional additional whitespace.

When a site receives an article from another site, it SHOULD (MUST after Verified-Path-Date), verify the identity of the source site. When processing an article from a source, the leftmost entry of the Path line should be extracted, converted to a canonical form, and tested to see if it matches the canonical form of the verified identity of the source. If it does, a "," should be used as the delimiter, and thus the comma, and then the receiving site's path-identity MUST be prepended to the Path line. The method of verification is up to the site. Any method of suitable authenticity may be chosen, with the consideration that in the event of problems at the source site, the relaying site may be called upon to reliably identify it. Verification schemes for the most common forms of article transmission are described below.

If the leftmost entry does not match the verified identity of the source, then the receiving site should prepend an "@" delimiter, then a simple form of the verified identity of the source, then a "," delimiter and then the receiving site's own path-identity. This adding of two identities to the line should not be done if the provided and verified identities match. For articles received from an internet source, the 32 bit IPv4 address or properly verified FQDN, whichever is shorter, is encouraged for the generated ID.

Tail Entry

For historical reasons, the rightmost entry in the Path string generated by most systems is not a site name, but a "user name." However, the Path string is not an E-mail address and MUST NOT be used to contact the user. Injecting agents MAY place any string here that is not a path-identity. If no meaning is anticiated the string "x" SHOULD be used.

RFC1036 suggested that the last entry could be a site name, requiring software to check it when feeding, but said it also should have a userid for very old systems. As of this specification, a systems SHOULD NOT treat the tail entry as a path-identity.

Typically this field will be the only entry on the Path string generated by a poster, or if not generated by the poster, by the injecting agent, which will prepend a "%" and then its own verifiable path-identity. The percent divides the verified part of the Path line from any entries provided prior to injection into the news network. There may be more than one entry after the percent, and all but the last are to be treated as sites.

Injectors SHOULD use the tail entry for local authentication information on the source of an article. For example, if they wish to store an encoding of the IP address of a source machine connecting to do the injection, and/or the UID of an invoking user or any other such information, they may encode it in the tail entry, provided they do so in a manner that will not match any site identifier. (e.g. ending with a dot.)

Injection Site Entry

The injecting agent's path identity is a special case. This identity should be a FQDN which can be used as a domain for E-mail connections (ie. it should have either an A or MX record.) Special E-mail addresses at that domain SHOULD exist for mail to the administrators of the injecting system. These E-mail addresses may be handled by a computer rather than a human being.

Any program which mails to these addresses MUST assure that no other system will send the same or a highly similar mail message. Ie. all steps MUST be taken to avoid program-detected errors causing mail messages to be sent from multiple sites.

Any mail to these addresses MUST have an "In-Reply-To" MAIL header with the Message-ID of the USENET message which inspired the mail.

injector-trouble: This address is for mail concerning software problems with the news injection software. No system that reports software problems may do so automatically unless it is a unique system or has taken other steps to assure that other copies of the problem report mailing software that may be running will not mail duplicate reports. IF it bounces, "usenet" MAY be used.
abuse: A mailbox for complaints about the actions of posters under the administration of the injector site. For use only after attempts to mail the actual poster have failed. As noted, all messages MUST have an In-Reply-To header, so that software handling this address can calculate statistics on the number of complaints over specific messages and users.
usenet: This address typically is a mailbox for a human being administering the system. It MUST NOT be used for complaints about improper or undesired postings by users of the injection server, unless mail to "abuse" bounces. It is for human-to-human contact about problems with the posting software, after other methods have failed.
postmaster: This address must exist for any Intenet E-mail domain, as per the E-mail and SMTP standards. However, it MUST not be used for USENET-inspired E-mail unless the above addresses return a bounce indicating they do not exist.

See RFC 2142 for other addresses. It defines all these excepting "injector-trouble."

A note of Commentary

USENET runs primarily on a philosophy that posters are responsible for their actions, not site administrators. Nonetheless, at large sites, the volume of mail to site administrators over problems people have with their users is often so high that they must hire full time staff (yes, plural) just to read and deal with that mail. This can only discourage participation in the network.

Those who see a problem on the net may take steps to inform those at the source of the problem, but they must think globally, and consider whether many others will be reporting the problem as well. It serves no purpose to innundate personal mailboxes with largely duplicate problem messages. In fact, it's ruder than many of the abuses being complained about.

Don't answer abuse with abuse of the trouble-reporting systems. Abuse of those systems just makes them less useful for dealing with future problems.

Purpose of Path

Aside from tracing the route articles take in moving over the network, Path is used primarily to allow relaying systems to not send articles to sites known to already have them, in particular the site they came from. This improves the efficiency of links, even ihave/sendme links.

When feeding an article, a relaying agent SHOULD check to see if the path-identity of the recipient site is present in the PATH line. If so, it SHOULD not feed the article to that site. When testing for a match, case should be ignored on entries in the FQDN form, but for legacy reasons, should be considered relevant on UUCP map entry forms.

Path is also used for USENET statistics gathering and flow tracking.

Finally the presence of a "%" delimiter in the Path header can be used to identify an article injected in conformance with this standard.

Truncating Path

The Path header MUST not be truncated.

Whitespace may be present in the Path to make it easier to represent. However, there is no requirement to do so. Whitespace MUST not be used as a delimiter without another non-white delimiter also present, however older software may generate it. Any use of a delimeter other than comma (to the left of the percent sign) should be considered an unverified path entry.

Delimiter Summary

A summary of delmiters and the meaning they imply for the name on the right, or in addition, the name to the left.

,: Verified or generated identity.
@: Name failed verification test. Name on left is identity generated by site further to the left.
%: Optional pre-injection entries followed by tail entry. Commonly just the tail entry, either "x" or an encoding of login identity. Name on left is FQDN of site that handles mail for injector. The presence of two "%" in a path indicates a double-injection error.
!: Entry is unverified. Identity on left is an old-style system not conformant with this specification.
Folding Whitespace: As "!" if no other delimiter is present, otherwise ignored.
Other: Treat as "!" as per RFC1036
"/": Reserved for future use, treat as ","
;: Semicolon is reserved for the generation of extensible headers.
:: The colon is a valid delmiter for legacy systems, however, inside an IPv6 numeric address, surrounded in square brackets, it is a part of the path-identifier.
_: This should not be treated as punctuation (a delmiter), contrary to RFC1036. Treat as part of identifiers.

Verified-Path-date

After the Verified Path deadline date, articles which contain unverified path entries MAY be rejected by other systems, and MAY cause an error message to go back to the poster or the USENET manager at the verified system prior to the unverified entry. (As with all error generating systems, steps MUST be taken to assure only one error message is received by the receiving party per error.)

Use of "!"

Old USENET relaying and injecting programs almost all delimit Path: entries with the "!" delmiter, and these entries are not verified. As such, the presence of "%" as a delmiter will indicate the article was injected by software conforming to this standard, and the presence of "!" as a delimiter will indicate the message passed through systems developed prior to this standard. Prior to the Verified-Path-date, messages with mixed sets of delimiters will be common. After that date, all messages should have no "!" delmiters prior to the "%" delimiter.

Suggested Verification Methods

Sites attempting to verify an incoming entry should take the following approaches for common transports. They are not required, but not following them may lead to wasteful double-entry Path addtions.

If the incoming article arrives through some protocol local to the site, such as UUCP, that protocol MUST include a means of verifying the article source site, and this should match. In UUCP implementations, commonly each incoming connection has a unique login name and password; that login name could be used to build a suitable verified identifier.

Here is an example of a suitable verification method for an article arriving via a TCP/IP protocol such as via NNTP:

If it is an encoding of an IP address, it should be decoded into a canonical form. If that address does not match the source's IP, a reverse-DNS (in-addr.arpa PTR record) lookup should be done on the provided address, followed by a regular DNS "A" record lookup on the returned name. That A record may contain several IP addresses. So long as one matches the IP address from the path, and another matches the source IP address, this is considered a match.
If it is a internet DNS style FQDN, then the name should be looked up with DNS. The A records MUST contain an IP address that is the verified address of the source.
(It should be noted that when generating a name after a non-match, if an FQDN is desired, simply doing a reverse DNS (PTR) lookup on the IP address is not sufficient to generate the FQDN. The returned name must be mapped back to A records to assure it matches the source's IP address.)

Issues

There is no firm way to tell a path entry generated by new software, and one generated by old software assuming that any delmiter is valid. However, use of "!" by old software has become effectively universal.

Sites are not strictly required to use a standard form for their path entry, but if they don't, path lines out of that site get longer due to the adding of the identity. However, groups of associated sites wanting a common identity may decide to use that and let the receiver add the specific site.