Main page > File formats > Archives

gzip archive file format

Typical file name extensions
.gz
.tgz (for gzip archives that store a tar archive inside of them)
Magic bytes
0x1f 0x8b at offset 0x00
MIME types
application/x-gzip
Compression types
Deflate (specified in RFC 1951)
History
Gzip (GNU zip) was created as a replacement for the compress utility, which uses an algorithm covered by patents (LZW). Note that as of June 2003, the LZW patent is no longer patented in most parts of the world. More on LZW on the on the GIF page.
Popularity
Very high on Unix systems. Often used in combination with tar. Tar does the concatenation of uncompressed files and metadata into one file, gzip will then be used to compress that tar archive. These days, bzip2 is used increasingly often instead of gzip.
Meta data
Optional archive comment.
All kinds of system-specific data (like file attributes) can be added per file using so called extra fields.
Limitations
Only one file can be stored in a gzip archive.
No support for extended character sets in file names.
Older versions of gzip cannot create archives larger than 4 GB.
Data recovery
A checksum (CRC32) is created on the included file to identify data corruption, but no error correction codes are used.
Encryption
Not supported.
Solid archives possible?
No. As gzip only works on a single file, it cannot create solid archives (which require several files in the archive to be treated as one).
Support for multiple volumes?
No.
Other features
The file format allows to append compressed data to the end of an existing gzip archive, e.g. like that:
gzip -c datafile >> archive.gz
This is especially helpful when adding to very large archives or concatenating existing archives because the step of decompressing and recompressing can be skipped. However, not all archivers can deal with gzip archives created that way. Notably, Winrar and PKUnzip have problems with them.
As it is streamable, the gzip format is often used by web servers to deliver data over HTTP. Browsers and other user agents set an Accept-encoding: header with gzip as value to show to servers that they can understand gzip.
Sample calls for the gzip command line utility
Create a gzip archive sample.dat.gz from a file sample.dat, using best compression (-9):
gzip -9 sample.dat
Decompress a gzip archive sample.dat.gz to a file sample.dat:
gzip -d sample.dat.gz
Libraries
The Java standard runtime library has code to read and write GZIP files in the package java.util.zip.
I've created a GZIPOutputStream replacement that allows for better configuration (specify file name, comment, compression level, modification time). TODO: add to site again.
Specification
The GZIP file format is described in RFC 1952.
Other links
The gzip homepage - includes source code for the standard version of the gzip utility, binaries and a list of frequently asked questions.
pigz - a gzip compressor which can use more than one thread in parallel, compressing 1 MB blocks. Speeds up compression on multi-processor system at the expense of a small (about 0.1 %) compression ratio penalty.
mod_gzip - an extension to the Apache webserver that allows to serve compressed HTML pages and other files.