Version 0.2 / 2005-08-05 / draft
This document describes the exact structure of the message container file format, a binary file format to store email and NetNews messages and corresponding metadata in a single file. The proposed file name extension for the format is .mcff, and files in the format are referred to as mcff files in this document.
Author's note: I'm in the process of designing this format for my own needs. Don't expect files of this format to exist anywhere.
TODO: Add some example messages and corresponding mcff files. Come up with some actual magic byte sequences.
The file format is supposed to store e-mail and NetNews messages.
The format of Internet electronic mail messages is specified in RFC 2822 (April 2001).
The format of NetNews messages is described in RFC 1036 (December 1987).
Note that the NetNews message format is built on top of the e-mail message format.
Software that wishes to add messages to mcff files must be able to understand those message formats to a certain degree. For a given message, it has to be able
Each mcff file consists of one or more message records which are stored one after another, without any additional data between them. A message record combines the raw data for a single message and information on that message. This additional information is partially redundant in order to avoid having to parse the message to determine that information.
A message record consists of three parts, in this order:
The size of a message record in bytes equals 44 plus the length of the message ID plus the length of the actual method. Text messages in Usenet are around 2 KB large, so the overhead is rather insignificant.
| Offset | Length | Type | Description |
|---|---|---|---|
| 0 | 8 | byte[] | The magic byte sequence identifying the beginning of a message record header: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 |
| 8 | 4 | uint32 | Message record header size in bytes. Excludes the magic bytes and this field. Should equal 16 plus the length of the message ID. |
| 12 | 4 | uint32 | Full message size in bytes. |
| 16 | 4 | uint32 | Message header section size in bytes. Must include the trailing CR LF bytes which are used to separate header and body section of a message. Must not be larger than the full message size specified in the previous field. This field is useful if software is supposed to load only the headers of a message because it keeps software from having to determine the header length itself. One application would be an NNTP server which is responding to a HEAD request. |
| 20 | 1 | uint8 | Year the message was sent, minus 1970. Example: the value 35 stands for the year 2005, because 2005 = 1970 + 35. 0 means unknown year. |
| 21 | 1 | uint8 | Month the message was sent. Valid values are 0 to 12. 0 means unknown month. |
| 22 | 1 | uint8 | Day of month the message was sent. Valid values are 0 to 31. 0 means unknown day. |
| 23 | 4 | uint32 | Flags. Meaning of the least significant bit: if it is set, this message is marked for deletion from the message file. The meaning of the other flags is still unspecified, but their usage is reserved, so do not use those bits yourself. |
| 27 | 1 | uint8 | Message ID length. |
| 28 | var | byte[] | US-ASCII bytes forming the Message ID string, without angle brackets. The length of this field is variable. Its actual length is determined by the value of the previous field message ID length. There is no leading length character as in Pascal, nor is there a terminating zero character as in C type strings. All bytes which are part of this field are actually part of the message ID. Each byte stores one character. |
This part contains the actual message data, as a sequence of bytes. Its exact size is given in the full message size field (the third field) of the message record header. Message data has to be preserved in its original state, so no character conversion or line ending changes (messages always have CR LF "DOS-style" line endings) can take place.
| Offset | Length | Type | Description |
|---|---|---|---|
| 0 | 8 | byte[] | The magic byte sequence identifying the beginning of a message footer header: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 |
| 8 | 4 | uint32 | Combined size of message record header and message data. This value X is helpful to compute the beginning of the message record. It starts X bytes before the first byte of the message footer record. |
| 12 | 4 | uint32 | CRC-32 checksum of all message record data before this checksum field. |