Main page > Usenet

Message container file format (mcff) specification

Version 0.2 / 2005-08-05 / draft

This document describes the exact structure of the message container file format, a binary file format to store email and NetNews messages and corresponding metadata in a single file. The proposed file name extension for the format is .mcff, and files in the format are referred to as mcff files in this document.

Author's note: I'm in the process of designing this format for my own needs. Don't expect files of this format to exist anywhere.

TODO: Add some example messages and corresponding mcff files. Come up with some actual magic byte sequences.

Messages

The file format is supposed to store e-mail and NetNews messages.

The format of Internet electronic mail messages is specified in RFC 2822 (April 2001).

The format of NetNews messages is described in RFC 1036 (December 1987).

Note that the NetNews message format is built on top of the e-mail message format.

Software that wishes to add messages to mcff files must be able to understand those message formats to a certain degree. For a given message, it has to be able

Overall file structure

Each mcff file consists of one or more message records which are stored one after another, without any additional data between them. A message record combines the raw data for a single message and information on that message. This additional information is partially redundant in order to avoid having to parse the message to determine that information.

A message record consists of three parts, in this order:

The size of a message record in bytes equals 44 plus the length of the message ID plus the length of the actual method. Text messages in Usenet are around 2 KB large, so the overhead is rather insignificant.

Message record header

Offset Length Type Description
0 8 byte[] The magic byte sequence identifying the beginning of a message record header: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
8 4 uint32 Message record header size in bytes. Excludes the magic bytes and this field. Should equal 16 plus the length of the message ID.
12 4 uint32 Full message size in bytes.
16 4 uint32 Message header section size in bytes. Must include the trailing CR LF bytes which are used to separate header and body section of a message. Must not be larger than the full message size specified in the previous field. This field is useful if software is supposed to load only the headers of a message because it keeps software from having to determine the header length itself. One application would be an NNTP server which is responding to a HEAD request.
20 1 uint8 Year the message was sent, minus 1970. Example: the value 35 stands for the year 2005, because 2005 = 1970 + 35. 0 means unknown year.
21 1 uint8 Month the message was sent. Valid values are 0 to 12. 0 means unknown month.
22 1 uint8 Day of month the message was sent. Valid values are 0 to 31. 0 means unknown day.
23 4 uint32 Flags. Meaning of the least significant bit: if it is set, this message is marked for deletion from the message file. The meaning of the other flags is still unspecified, but their usage is reserved, so do not use those bits yourself.
27 1 uint8 Message ID length.
28 var byte[] US-ASCII bytes forming the Message ID string, without angle brackets. The length of this field is variable. Its actual length is determined by the value of the previous field message ID length. There is no leading length character as in Pascal, nor is there a terminating zero character as in C type strings. All bytes which are part of this field are actually part of the message ID. Each byte stores one character.

Message data

This part contains the actual message data, as a sequence of bytes. Its exact size is given in the full message size field (the third field) of the message record header. Message data has to be preserved in its original state, so no character conversion or line ending changes (messages always have CR LF "DOS-style" line endings) can take place.

Message record footer

Offset Length Type Description
0 8 byte[] The magic byte sequence identifying the beginning of a message footer header: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 
8 4 uint32 Combined size of message record header and message data. This value X is helpful to compute the beginning of the message record. It starts X bytes before the first byte of the message footer record.
12 4 uint32 CRC-32 checksum of all message record data before this checksum field.