Main page > Software

ffident — Java metadata extraction / file format identification library

This is the first version of a Java library to extract information from files and identify their formats. Most operating systems encode the format of a file in the file name extension. However, there are problems with this approach:

When running on an operating system that knows about file types regardless of file extensions (like Mac OS), querying that type of information may be an option. However, that approach is too platform dependent. The only way to be sure about what a file contains is to look at its content. Unfortunately, this requires knowledge of the internal structure of all file formats.

A working solution is to collect information on the most interesting and most common file formats. This has been done in the Unix command line utility file(1) for a long time. It checks each file to be examined against a list of known signatures (the magic(5) file).

This first version of a yet to be named file format identification library uses the same approach. In the future the library will also be able extract format-group-specific metadata from files, just like ImageInfo does it for images.

Usage

Demo program idtree

The library comes with a command line utility named metadata.idtree. It checks files and directories and prints the format name or unknown data to standard output. Run it like this:

java metadata.idtree d:\test.jpg c:\files\

This will check the file d:\test.jpg and the complete directory tree under c:\files\.

From inside a program

If you need information from within a Java program, use the library like this:

File file = new File("filename");
FormatDescription desc = FormatIdentification.identify(file);
if (desc == null) {
  System.out.println("Unknown format.");
} else {
  System.out.prinlnt("Format=" + desc.getShortName() + ", MIME type=" + desc.getMimeType());
}

Check out FormatDescription for the kind of information which can be queried after successful file examining (result of identify is not null).

Links on Java, metadata extraction and file format identification

License

This library is distributed under the LGPL.

Requirements

This library requires Java 1.4 or higher.

ChangeLog

Download

ffident-0.2.zip (~ 7 KB).