Main page > Java

Creating thumbnails with Java

This article describes problems with batch thumbnail (small preview image) generation using Java. If you're dealing with image file batch processing under Java, you might be interested in this because the errors I encountered and describe occur rarely, and code which seemed to work with the 100 files you tested it with may fail when running on 100,000 files coming from different sources (so, not just your own digital camera, as an example). If you have little time, just look out for the boxed recommendation paragraphs (1, 2, 3, 4, 5, 6). This article is a work in progress, to some extent a field report. If you feel that I misrepresented something (or plain didn't get it), please send me a mail!

Introduction

One of the most popular pages of this site is the one about loading an image, creating a thumbnail from it and saving that thumbnail as a JPEG file. These three steps are pretty straight-forward to implement in a Java application. In fact, they are one-liners (mostly self-explanatory, if you care about the details of the scaling part in the second line, check out the thumbnail demo program):

BufferedImage originalImage = ImageIO.read(new File(originalImageFileName));
...
graphics2D.drawImage(originalImage, 0, 0, thumbWidth, thumbHeight, null);
...
ImageIO.write(thumbnailImage, "jpg", new File(thumbnailFileName));

Not that image loading, scaling or saving are simple operations, in fact they can become quite complicated, but the Java runtime library does a nice job of offering the functionality and keeping the nasty details away.

It works quite well most of the time. This article is about the tiny fraction of files which caused problems. I'll try to make some recommendations when I've found a satisfactory work-around.

Loading images from image files

As I mentioned above, I used the line

BufferedImage originalImage = ImageIO.read(new File(originalImageFileName));

to load the original image. The API documentation of ImageIO.read(File input) describes two exceptions which can occur:

    IllegalArgumentException - if input is null. 
    IOException - if an error occurs during reading.

I took care that the File object input would always be non-null, and I caught the IOException. Unfortunately, that's not enough.

Image files come in various formats, each one usually having several subtypes, supporting different compression algorithms, color types, and so on. Getting it all right is hard as it is. In addition, some software packages write files which do not conform to standard specifications. Sometimes, there aren't (good, open, free) specifications at all. Sometimes, files get corrupted when they are being transferred, most of the time truncated because a network connection fails and isn't resumed.

As an application programmer it doesn't really matter who is to blame for the failure of loading a particular image file. It's important to not have the program crash in the process like this:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
        at java.util.ArrayList.get(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.checkTablesOnly(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.gotoImage(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.readHeader(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.read(Unknown Source)
        at javax.imageio.ImageIO.read(Unknown Source)
        at javax.imageio.ImageIO.read(Unknown Source)
        [Rest snipped]

Or this:

javax.imageio.IIOException: JFIF APP0 must be first marker after SOI
        at com.sun.imageio.plugins.jpeg.JPEGMetadata.<init>(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.getImageMetadata(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.getNumThumbnails(Unknown Source)
        [Rest snipped]

Or this:

java.lang.IllegalArgumentException: bandOffsets.length is wrong!
        at javax.imageio.ImageTypeSpecifier$Interleaved.<init>(Unknown Source)
        at javax.imageio.ImageTypeSpecifier.createInterleaved(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.getImageTypes(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.read(Unknown Source)
        at javax.imageio.ImageIO.read(Unknown Source)
        at javax.imageio.ImageIO.read(Unknown Source)
        [Rest snipped]

Some formats like JPEG allow embedded color profiles, but interpreting them adds another source of concern and possibly, exceptions:

java.awt.color.CMMException: General CMM error517
        at sun.awt.color.CMM.checkStatus(Unknown Source)
        at sun.awt.color.ICC_Transform.<init>(Unknown Source)
        at java.awt.image.ColorConvertOp.filter(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.acceptPixels(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.readImage(Native Method)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(Unknown Source)
        at com.sun.imageio.plugins.jpeg.JPEGImageReader.read(Unknown Source)
        at javax.imageio.ImageIO.read(Unknown Source)
        at javax.imageio.ImageIO.read(Unknown Source)
        [Rest snipped]

Update 2007-07-24. I missed the most obvious problem because I was so generous giving memory the virtual machine: an OutOfMemoryError can occur if the image does not fit into the VM's memory. (example: run java -mx512m -jar SomeApp.jar to give 512 MB to the VM). I never ran into that one with the 100,000 or so files I had, some of which were large, but I finally fed a 16,000 x 16,000 pixel truecolor image to the loader (uncompressed size: 768 million bytes), and voilą, it blew up because OutOfMemoryError is not a descendant of java.lang.Exception. I therefore changed the recommendation below from catching Exception to catching Throwable.

Recommendation #1: When calling ImageIO.read, catch Throwable, not IOException.

Note that this recommendation can be generalized to apply to other image codec libraries. I've tried some from my list of pixel image I/O libraries and ran into similar exceptions like the ones above with ImageIO. Just put the entire code for opening, reading and closing files into a try catch clause that handles the more general Exception or Throwable.

If you are using Imagero (see below) there is a workaround for creating thumbnails from huge JPEG files (and, at the moment, only JPEG). Imagero has a setSubsampling method in its JPEGReader class. Calling setSubsampling(8) on a reader will load only one pixel from each 8 x 8 pixel block, resulting in an image one 64th the size of the original. In the case of my 16,000 x 16,000 pixel image this reduces memory to about 12 MB. From this smaller (although certainly not small) image the final thumbnail is then created. Note that throwing away lots of pixels is a rather poor scaling method, but if you're aiming at a 160 x 160 pixel thumbnail like me, giving up 98.5 % of the original image data shouldn't matter because the remaining 1.5 % are still a lot to work on.

In order to support more files, you can use several libraries. If the first fails, try the second, if that fails, the third, and so on.

A problem I expected to run into but never did was that of too many open files. With most operating systems an application can only have a fixed number of files open at the same time. A too many open files error (when trying to open another one) usually occurs as the result of forgetting to close open files that aren't needed anymore. Codecs are usually given file names or File objects, and with all those exceptions I assumed that at some point there would be open files that I would not have the opportunity to reach. However, I didn't try each and every library, so this might happen to someone, look out for it.

Recommendation #2: When possible, open an InputStream or RandomAccessFile yourself and give it to the loading library instead of giving it just the file name. Afterwards, close that I/O object yourself in the finally clause of a try statement.

Loading embedded thumbnails

Instead of loading and scaling down potentially large images, simply reusing existing thumbnails can speed up the process significantly. Unfortunately, finding and loading those thumbnails with Java isn't easy.

The most obvious approach, as usual, is to look for the support that comes built into the Java runtime library. Indeed, the ImageIO API has methods to retrieve thumbnails. Less convenient than that ImageIO.read(File) one-liner, thumbnail reading still is rather easy to use:

ImageInputStream input = ImageIO.createImageInputStream(new File("image.jpg"));
Iterator iter = ImageIO.getImageReaders(input);
ImageReader reader = (ImageReader)iter.next();
if (reader.getNumThumbnails() > 0) {
  thumbnail = reader.readThumbnail(0, 0);
}

This compiles and runs, but I've yet to see a single thumbnail being retrieved that way. I've used both the regular JPEG reader and the one called CLibJPEGImageReader which comes with the Java Advanced Imaging Image I/O Tools. Maybe my file collection was too JPEG-heavy and the approach works with other formats already. Maybe this has yet to be implemented. Running this code before trying to load the main image may work under some circumstances.

I've found very good support for thumbnail loading in Imagero, a commercial image codec library with a free trial version which supports a larger number of file formats.

Update 2007-07-26. I have now written my own code to extract some of the more interesting information embedded in JPEG files, including thumbnails and GPS information. Most of it is part of an EXIF section of the JFIF stream, which uses TIFF header structures. I refactored some of the TIFF-reading code of JIU and browsed the EXIF specification (PDF). It sure helps if the TIFF part of the code is already working as in my case. Even if it does, you also have to know your TIFF terminology (IFD, tags, and so on) to read things in the right order and from the right places.

Update 2007-08-06. Using embedded thumbnails is nice because it speeds up the process of scanning a large directory tree. However, I've noticed numerous JPEG files where the embedded thumbnail has the wrong orientation. Apparently, some tools which perform lossless rotation of JPEG files do not handle the thumbnail in the same way. I therefore propose a post-processing step in your thumbnail generation code. Obviously, computers can't recognize an image's content to find out whether a thumbnail should be rotated. However, rotation angles are often 90 and 270 degrees. If an image has differing width and height values (almost always the case with digital cameras), compare if the relationship of those two values (width smaller than height or vice versa) differs between original image and thumbnail image. If there is a difference, throw away the embedded thumbnail and do the standard procedure for thumbnail generation, loading the original image, scaling and encoding it yourself.

Update 2007-11-02. There seems to be some EXIF field that stores the proper orientation of a picture. I don't quite get why the software in the camera would store an image with a wrong orientation and the hint "by the way, this is false". After all, rotation in steps of 90 degrees is not at all computationally expensive. In-memory rotation wouldn't even be necessary, the file encoder would just have to have a slightly modified pixel grabbing routine. Anyway, if this orientation flag is present, thumbnail-creating software should try to read and honor it. TODO: find out where exactly that information is stored.

In addition there are platform-dependent thumbnail repositories:

All of these could be used, but dedicated code would have to be written to support each. It's unfortunate that there doesn't seem to be a unified API under Windows to store and load thumbnails which all applications could use. Or does such a thing exist? Do other operating systems handle this better?

Recommendation #3: Try to reuse existing thumbnails, but be prepared to spend some time writing or integrating additional code.

All additional features like in this case loading embedded thumbnails require more code, but I hope it's clear what I mean in this context—that thumbnails get stored in a number of places, and getting them from those places is rather tedious business.

Image scaling

As was mentioned in the introduction, scaling can be done in Java by creating a new Image object and drawing the original image into that new image, scaling is done by the drawing method. Now there are several things which can go wrong.

Speed

The first problem I encountered with image scaling was with using interpolation on some images loaded from JPEGs with color profiles. In the loading section I mentioned an exception which was raised with a certain JPEG with an embedded color profile. ImageIO usually loads those successfully, but with some of them a relatively modern system required two full minutes (!) to scale one down to 128 x 128 pixels. The problem seems to come from the fact that I used interpolation. The drawing context object Graphics2D offers interpolation, which leads to nicer-looking scaled images at the cost of additional processing time. However, in some cases the additional time gets out of hand. Maybe pixel values have to be recomputed very often with some profiles while scaling. The solution is not to use interpolation, commenting out

graphics2D.setRenderingHint(RenderingHints.KEY_INTERPOLATION,
  RenderingHints.VALUE_INTERPOLATION_BILINEAR);

However, while interpolation doesn't make much of a difference with most photos, it can be essential with other types of image, like documents. Scaling an image with black text on a white background down to a thumbnail results in a thumbnail which doesn't look like the original—it's mostly white with a few black pixels. Interpolation leads to black and white pixels being "merged" to gray pixels giving an appropriate representation of the original. If you know that your program works on a certain class of images like photos or documents, you can base your decision whether to use interpolation on that fact.

Recommendation #4: Turn off interpolation unless you know you need it.

A more sophisticated approach would try to identify problematic image file types like JPEGs with embedded color profiles and turn off interpolation only with those. However, I don't know a simple way of doing that.

If processing time is not a concern at all, you can leave interpolation on.

On the other hand, should speed be very important, you should not use interpolation anyway because it costs additional time, even in non-pathological cases.

Update 2008-03-18 In reply to a 2002 bug submission on slow drawing with scaled image instances at the Sun Developer Network, a user (mrsteve) provided a code snippet with a workaround in 2004. It tries to make the JPEG decoder use the sRGB color space if available and speeds up the loading process. However, at least one other user had the workaround code throw an exception.

Drawing exceptions

The next error caused an exception with an image loaded successfully from a GIF file. GIF supports only paletted images, each pixel is a value to be used as an index into a list of colors, the palette. Normally, drawImage handles those quite well. However, in one case the following exception was caused:

Exception in thread "main" java.awt.image.ImagingOpException: Unable to transform src image
        at java.awt.image.AffineTransformOp.filter(Unknown Source)
        at sun.java2d.pipe.DrawImage.renderImageXform(Unknown Source)
        at sun.java2d.pipe.DrawImage.transformImage(Unknown Source)
        at sun.java2d.pipe.DrawImage.scaleImage(Unknown Source)
        at sun.java2d.pipe.DrawImage.scaleImage(Unknown Source)
        at sun.java2d.pipe.ValidatePipe.scaleImage(Unknown Source)
        at sun.java2d.SunGraphics2D.drawImage(Unknown Source)
        at sun.java2d.SunGraphics2D.drawImage(Unknown Source)
        [Rest snipped]

So converting colors failed in some way. I had the idea not to use a fixed BufferedImage color type (TYPE_INT_RGB) when creating the thumbnail image but to use the value from loaded image. Scaling from one color type to the same color type should not cause problems, at least that was the idea. However, changing the thumbnail image object constructor to

BufferedImage thumbnail = new BufferedImage(newWidth, newHeight, image.getType());

resulted in this exception:

Exception in thread "main" java.lang.IllegalArgumentException: Unknown image type 0
        at java.awt.image.BufferedImage.<init>(Unknown Source)
        [Rest snipped]

Image type 0 seems to be represented by BufferedImage.TYPE_CUSTOM. In the documentation of that value it says:

Image type is not recognized so it must be a customized image. This type is only used as a return value for the getType() method.

So I switched back to TYPE_RGB.

Then there's the out-of-memory error.

Exception in thread "Thread-5" java.lang.OutOfMemoryError: Java heap space
        at java.awt.image.DataBufferInt.<init>(Unknown Source)
        at java.awt.image.Raster.createPackedRaster(Unknown Source)
        at java.awt.image.DirectColorModel.createCompatibleWritableRaster(Unknown Source)
        at java.awt.image.BufferedImage.<init>(Unknown Source)
        at sun.java2d.loops.GraphicsPrimitive.convertFrom(Unknown Source)
        at sun.java2d.loops.GraphicsPrimitive.convertFrom(Unknown Source)
        at sun.java2d.loops.MaskBlit$General.MaskBlit(Unknown Source)
        at sun.java2d.loops.Blit$GeneralMaskBlit.Blit(Unknown Source)
        at sun.java2d.pipe.DrawImage.blitSurfaceData(Unknown Source)
        at sun.java2d.pipe.DrawImage.renderImageCopy(Unknown Source)
        at sun.java2d.pipe.DrawImage.copyImage(Unknown Source)
        at sun.java2d.pipe.DrawImage.copyImage(Unknown Source)
        at sun.java2d.pipe.ValidatePipe.copyImage(Unknown Source)
        at sun.java2d.SunGraphics2D.drawImage(Unknown Source)
        at sun.java2d.SunGraphics2D.drawImage(Unknown Source)
        at sun.java2d.pipe.DrawImage.makeBufferedImage(Unknown Source)
        at sun.java2d.pipe.DrawImage.renderImageXform(Unknown Source)
        at sun.java2d.pipe.DrawImage.transformImage(Unknown Source)
        at sun.java2d.pipe.DrawImage.scaleImage(Unknown Source)
        at sun.java2d.pipe.DrawImage.scaleImage(Unknown Source)
        at sun.java2d.pipe.ValidatePipe.scaleImage(Unknown Source)
        at sun.java2d.SunGraphics2D.drawImage(Unknown Source)
        at sun.java2d.SunGraphics2D.drawImage(Unknown Source)

Recommendation #5: Catch Throwable when calling drawImage.

This is recommendation #1 used in another context. I don't know how to avoid the color transformation problem, which would obviously be preferable to just handling the failure gracefully.

Saving thumbnails

With some applications, thumbnails are just created and used as Image objects for display. If that applies to you, you don't need this section. Whenever thumbnails are to be stored in some way and they didn't come encoded as in the case of the loading of embedded thumbnails, an encoder is needed, usually with compression. Even if thumbnails are relatively small, a 128 x 128 pixel RGB thumbnail requires 128 x 128 x 3 bytes = 48 KB. A JPEG version of that is only three to four KB.

This is the part where nothing really went wrong for me so far. That is probably owed to the fact that once the thumbnail was created, in my case it was a regular BufferedImage of type TYPE_INT_RGB. Those apparently are common and expected by the encoder and don't cause trouble when used with

ImageIO.write(thumbnailImage, FORMAT, new File(thumbnailFileName));

where FORMAT is "jpg" or some other supported file format.

If I had played around more with creating compatible image objects for scaling, I'd expect errors to happen here as well because scaling would have failed less, but the codec may have had trouble getting pixels in a color model it needed and understood. But that's just speculation.

When writing code to save a thumbnail, I recommend to create a thumbnail in memory by having the codec write data to a ByteArrayOutputStream. That way it is easy to adapt a program from just writing thumbnail files to storing them in a BLOB column of a database or serving the thumbnail as part of a Web project.

A matter which can be overlooked easily is the choice of the thumbnail file format. While JPEG is the format of choice for photos, other image types should better be stored with another format, e.g. graphics with large areas of the same color. Using the same format as output format that the input format came in sounds good at first. However, that doesn't really work out some of the time:

A careful approach to the problem would only store images known to come from JPEGs as JPEG and save everything else as PNG or in a similar format using a general-purpose lossless compression algorithm. PNG usually creates much larger byte streams than JPEG, so this comes at a price.

Recommendation #6: Try to keep track of where an image came from and how it was processed until it became a thumbnail.