rfc822

RFC 2822 message manipulation.

Note: This is only a very rough sketch of a full RFC-822 parser; in particular the tokenizing of addresses does not adhere to all the quoting rules.

Note: RFC 2822 is a long awaited update to RFC 822. This module should conform to RFC 2822, and is thus mis-named (it’s not worth renaming it). Some effort at RFC 2822 updates have been made, but a thorough audit has not been performed. Consider any RFC 2822 non-conformance to be a bug.

Directions for use:

To create a Message object: first open a file, e.g.:

fp = open(file, ‘r’)

You can use any other legal way of getting an open file object, e.g. use sys.stdin or call os.popen(). Then pass the open file object to the Message() constructor:

m = Message(fp)

This class can work with any input object that supports a readline method. If the input object has seek and tell capability, the rewindbody method will work; also illegal lines will be pushed back onto the input stream. If the input object lacks seek but has an `unread’ method that can push back a line of input, Message will use that to push back illegal lines. Thus this class can be used to parse messages coming from a buffered stream.

The optional `seekable’ argument is provided as a workaround for certain stdio libraries in which tell() discards buffered data before discovering that the lseek() system call doesn’t work. For maximum portability, you should set the seekable argument to zero to prevent that initial code{tell} when passing in an unseekable object such as a file object created from a socket object. If it is 1 on entry – which it is by default – the tell() method of the open file object is called once; if this raises an exception, seekable is reset to 0. For other nonzero values of seekable, this test is not made.

To get the text of a particular header there are several methods:

str = m.getheader(name) str = m.getrawheader(name)

where name is the name of the header, e.g. ‘Subject’. The difference is that getheader() strips the leading and trailing whitespace, while getrawheader() doesn’t. Both functions retain embedded whitespace (including newlines) exactly as they are specified in the header, and leave the case of the text unchanged.

For addresses and address lists there are functions

realname, mailaddress = m.getaddr(name) list = m.getaddrlist(name)

where the latter returns a list of (realname, mailaddr) tuples.

There is also a method

time = m.getdate(name)

which parses a Date-like field and returns a time-compatible tuple, i.e. a tuple such as returned by time.localtime() or accepted by time.mktime().

See the class definition for lower level access methods.

There are also some utility functions here.

Functions

dump_address_pair(pair) Dump a (name, address) pair in a canonicalized form.
formatdate([timeval]) Returns time format preferred for Internet standards.
mktime_tz(data) Turn a 10-tuple as returned by parsedate_tz() into a UTC timestamp.
parseaddr(address) Parse an address into a (realname, mailaddr) tuple.
parsedate(data) Convert a time string to a time tuple.
parsedate_tz(data) Convert a date string to a time tuple.
quote(s) Add quotes around a string.
unquote(s) Remove quotes from a string.
warnpy3k(message[, category, stacklevel]) Issue a deprecation warning for Python 3.x related changes.

Classes

AddressList(field) An AddressList encapsulates a list of parsed RFC 2822 addresses.
AddrlistClass(field) Address parser class by Ben Escoto.
Message(fp[, seekable]) Represents a single RFC 2822-compliant message.