File formats
This is the base page for all information that I offer on file formats.
I have written descriptions for a few file formats.
These texts are not detailed, and they are no replacement for specifications,
which explain the exact structure of a file format.
The texts are meant as high-level overviews for developers who want to get
an impression of what a particular file format is capable of storing.
The format descriptions are sorted by file type.
Each file type (group of formats storing the same type of data,
like archives or images),
has an introductory text of its own, to summarize file-type-specific properties.
-
Archive file formats
- bzip2 - format for single files popular under Unix, often used in combination with tar.
- gzip - format for single files popular under Unix, often used in combination with tar.
- rar - popular archive file format under Windows.
- tar - popular archive file format under Unix without compression; often used in combination with gzip.
- zip - very popular format, software available for many platforms, best-known are WinZip and PKZip.
-
Image file formats
- GIF - popular image file format for graphics.
- JPEG / JFIF - popular image file format for photos.
- PNG - image file format for graphics and photos.
- TIFF - image file format for image editing and document imaging.
Articles
Software
Some of the pages with descriptions of a file type (like archives) or
single file formats (like ZIP)
linked to above
contain libraries and additional tools which are specific to that type or format.
This section contains links to software, programs and libraries which deal with several types and formats at once.
-
ffident file format identification library —
determine file format, format group and MIME type for a given file.
Available under the LGPL. Requires Java 1.4 or higher.
Disclosure: Developed by the author of this page.
-
DROID —
software developed by The National Archives to identify file formats.
It comes with file format signature information encoded in a special XML schema.
Requires Java 1.4.2 or higher.
-
file(1) —
a command line tool that can give file format details on a large number of formats.
Uses a pattern matching technique to identify formats and extract all sorts of information.
The patterns are stored in a so-called magic file.
This tool is typically available on all Unix systems.
It has been ported to Windows as well.
Call it with file names as parameters:
$ file *
-
filetype —
a tool similar to file which can also be used as a library and only detects file types
(images, documents, archives, executables).
A hierarchy of file types is used (e.g. images, pixel images, and so on).
Windows version.
-
FileAlyzer —
freeware Windows GUI program that shows metadata for a number of file formats.
-
getID3() - The PHP media file parser —
PHP library (available under the GPL) that supports many formats, in great detail.
-
Java Activation Framework (JAF)
-
Java Mime Magic —
LGPL.
-
Getting a MIME type with Java
-
libExtractor —
a C library (available under the GPL) to extract metadata from various file formats.
-
TrID —
identify file formats by scanning the file content for signatures.
Is capable of learning new formats by examining a set of files in a new format
and determining unique signatures in the process.
Related sites
-
Wotsit.org —
Offers specifications for numerous file formats.
If you are trying to write software to access a particular file format and if you don't know
its structure, you may get lucky on this site.
-
Open Directory category on data formats
—
DMOZ directory on file formats and other data formats.