Main page > File formats > Archives

bzip2 archive file format

Typical file name extensions
.bz2
.tbz2 (for bzip2 archives that store a tar archive inside of them)
Magic bytes
0x42 0x5a 0x68 0x39 0x31 at offset 0x00
MIME types
application/x-bzip2
Compression types
A combination of the Burrows-Wheeler transform (BWT) and Huffman entropy coding.
Description
The bzip2 application was created and is still maintained by Julian Seward. On average it compresses better than gzip, but it also requires more CPU cycles and memory. However, usage of both of these resources is configurable with command line switches. bzip2's usage is very similar to gzip's. Most command line switches can be used with both programs.
Popularity
Medium on Unix systems, low elsewhere. Often used in combination with tar. Tar does the concatenation of uncompressed files and metadata into one file, bzip2 will then be used to compress.
Metadata
?
Limitations
Only one file can be stored in a bzip2 archive.
The bzip2 documentation contains a text on limitations of the compressed file format.
Data recovery
bzip2 compresses blocks of data independent from each other. For each block, a checksum is created and stored. That's why all non-corrupted compressed data blocks can be easily detected and recovered. The tool bzip2recover which is available from the bzip2 homepage can do this.
Encryption
Not supported.
Solid archives possible?
No. As bzip2 only works on a single file, it cannot create solid archives (which require several files in the archive to be treated as one).
Support for multiple volumes?
No.
Other features
The file format allows to append compressed data to the end of an existing bzip2 archive, e.g. like that:
bzip2 -c datafile >> archive.bz2
This is especially helpful when adding to very large archives or concatenating existing archives because the step of decompressing and recompressing can be skipped.
Libraries
libbzip2 is the available from the bzip2 homepage, written in C. [BSD-style license]
Java port of bzip2 at Aftex Sw, classes CBZip2InputStream and CBZip2OutputStream.
The I/O bzip2 Java classes are also available as part of the Jakarta Avalon/Excalibur project.
Specification
None (check the C source code of the bzip2 utility).
Other links
The bzip2/libbzip2 homepage - includes C source code and binaries for the bzip2 program and libbzip2, a compression library.
PBZIP2—Parallel bzip2 - uses more than one processor to speed up compression and decompression.
Bzip2 at Wikipedia - encyclopedic article.