This is the first version of a Java library to extract information from files and identify their formats. Most operating systems encode the format of a file in the file name extension. However, there are problems with this approach:
.rle instead of .bmp.
Few people know this.
On the other hand, some file name extensions are used by more than one format (e.g. .bmp or .img).When running on an operating system that knows about file types regardless of file extensions (like Mac OS), querying that type of information may be an option. However, that approach is too platform dependent. The only way to be sure about what a file contains is to look at its content. Unfortunately, this requires knowledge of the internal structure of all file formats.
A working solution is to collect information on the most interesting and most common file formats.
This has been done in the Unix command line utility file(1) for a long time.
It checks each file to be examined against a list of known signatures (the magic(5) file).
This first version of a yet to be named file format identification library uses the same approach. In the future the library will also be able extract format-group-specific metadata from files, just like ImageInfo does it for images.
The library comes with a command line utility named metadata.idtree.
It checks files and directories and prints the format name or unknown data to standard output.
Run it like this:
java metadata.idtree d:\test.jpg c:\files\
This will check the file d:\test.jpg and the complete directory tree under c:\files\.
If you need information from within a Java program, use the library like this:
File file = new File("filename");
FormatDescription desc = FormatIdentification.identify(file);
if (desc == null) {
System.out.println("Unknown format.");
} else {
System.out.prinlnt("Format=" + desc.getShortName() + ", MIME type=" + desc.getMimeType());
}
Check out FormatDescription for the kind of information which can be queried
after successful file examining (result of identify is not null).
This library is distributed under the LGPL.
This library requires Java 1.4 or higher.
FormatDescription#addMimeTypes:
changed if (mimeTypes == null) to if (mimeType == null).
Thanks to everyone reporting that.ffident-0.2.zip (~ 7 KB).