Main page > File formats > Archives

ZIP archive file format

Typical file name extensions
.zip
Magic bytes
0x50 0x4b at offset 0x00
MIME types
application/zip
Description
A file format created by PKWare for their pkzip / pkunzip shareware tools. The format was contributed to the Public Domain by Phil Katz. Unfortunately, undocumented compression and encryption methods have been added by the two most popular ZIP archiver implementations, Winzip and PKZip, making the format incompatible when those methods are used (thus making the format less useful because it is no longer certain that any zip application can read any zip file). PKWare seeks a patent for combining ZIP and strong encryption.
A discussion in July 2003 (message ID dlovato-991D82.10005725072003@newssvr21-ext.news.prodigy.com) in the newsgroup comp.compression clarifies the problem. Note that while the discussion was started by an engineer of Aladdin Software, one of PKWare's and Winzip Computing's direct competitors, it sheds light on most reasons for and implications of changing a popular format.
Update January 2004: At least on the field of encryption the two main ZIP product vendors have agreed to cooperate.
Compression types
Deflate (specified in RFC 1951)
bzip2, which is the same type as used in the bz2 archive file format
WinZip has introduced a compression type called PPMd.
A couple of other older types which usually don't get used anymore because Deflate et al. are superior.
Popularity
Very high; quasi standard under Windows.
Meta data
Optional archive comment.
Optional comment per entry.
All kinds of system-specific data (like file attributes) can be added per file using so called extra fields.
Limitations
Internal offset values are 32 bits large, so only files up to 4 GB can be stored (the format was extended to use 64 bit integers, but not all programs support that).
No support for extended character sets in file names.
Patents
PKWare has patented a string searching algorithm in 1990, see String searcher, and compressor using same at the European Patent Office.
Data recovery
PKZIPFIX is a tool that creates a new ZIP archive from any damaged archive, putting all non-damaged file entries into that new archive.
Checksums (CRC32) are created on each included file to identify data corruption, but no error correction codes are used.
Encryption
There are several encryption types supported at this time, but that is a recent development. An encryption method called "traditional" encryption has been used for quite some time, but it is not recommendable. There is a known plain-text attack (Biham and Kocher, 1994). That attack has been further improved (Stay, 2001).
ODP lists quite a few password recovery tools.
The more modern encryption types are safer, but not supported in all ZIP programs.
Solid archives possible?
No.
A work-around to improve compression ratio would be to create a ZIP archive with all of its entries stored uncompressed, and putting that uncompressed archive into another ZIP archive, then using regular compression.
Support for multiple volumes?
Yes, the format provides signatures to identify archive files that are not the first volume. However, all parts of a multi-volume archive have the same name, so they cannot be put into one directory.
Libraries
The Java standard runtime library has code to access and create ZIP files in the package java.util.zip
Specification
PKWare has designed the file format and offers an application note text file that describes it.
Other links
PKWare - inventors of the ZIP file format.
Compression links ZIP section.
ODP list of Windows archivers, most support ZIP
Info-ZIP. Free port of ZIP utilities to almost any platform. Written in C.
WinZip. Arguably the most popular ZIP application (commercial).