Main page > Java > Code examples

FileDownload.java—Download files from HTTP sources and save them locally

The FileDownload program is a small command line program that demonstrates how to download files offered via the HyperText Transfer Protocol (HTTP). This protocol is the basis for much of what is transferred on the World Wide Web. While the "text" part in its name suggests a restriction with regard to what can be accessed, any binary file can be transferred using that protocol. In fact, I've mostly tested it with image files.

The program is called with a number of URLs (addresses like this page's address http://schmidt.devlib.org/java/file-download.html) as arguments and will then download them to the current directory. This is the core functionality of wget, cUrl and similar download tools. If you want to learn how to use that feature in one of your Java programs, continue to read this page.

Programming project tip Downloading files from the Web is one of the tasks to be performed by a seach engine. Crawlers (also known as robots or bots) continuously download files and index them. Writing a minimum search engine is an interesting task. This page delivers part of the solution. As for the text search functionality, you might borrow some code from my Lucene Wikipedia indexer. It demonstrates how to use the free Lucene text search library.

Compiling and running the program

These instructions are hopefully beginner-friendly. That's why they are a bit verbose.

  1. Save the source code in a file FileDownload.java (regard case).
  2. Open a prompt (shell), change to the directory where you have saved the file and compile it:
    javac FileDownload.java
    Now you should have a new file FileDownload.class in the same directory.
  3. Run the program with some URL as parameter: java FileDownload http://schmidt.devlib.org/java/file-download.html

Explanation

This section explains what the program does, in order of its execution by the JVM (Java Virtual Machine), starting at the the public static main(String[]) method.

In the main method, all arguments are simply passed to the download method with one String argument.

That download method tries to come up with a sensible file name from the address String. It does so by copying everything after the last slash into a new String. It then calls the remaining method, download with two String arguments.

That method tries to copy data from the address given by the first argument to a file named like the second argument. Note that existing files will be overwritten without confirmation, so make sure the URLs you use cannot create havoc in your local file system.

Now for the interesting part. First, a URL object for the source address is created. A URLConnection is then created from the URL and last, an InputStream via the connection's getInputStream method. All these three calls can fail for various reasons. The address may not exist, may be invalid or just not accessible at the moment. An exception is then thrown and our downloading process ends.

If the input stream is successfully opened, the program creates a small data buffer and opens a local file for writing. It then loops over the input, waiting for a value of -1 indicating end of stream. With each loop iteration, data is written to the output file and a counter numWritten with the number of bytes which were written is increased.

On successful termination of that copying process, the program prints local file name and file size to standard output. The finally clause of the try-catch-finally statement then makes sure both streams are closed.

Source code of FileDownload.java

import java.io.*;
import java.net.*;

/*
 * Command line program to download data from URLs and save
 * it to local files. Run like this:
 * java FileDownload http://schmidt.devlib.org/java/file-download.html
 * @author Marco Schmidt
 */
public class FileDownload {
	public static void download(String address, String localFileName) {
		OutputStream out = null;
		URLConnection conn = null;
		InputStream  in = null;
		try {
			URL url = new URL(address);
			out = new BufferedOutputStream(
				new FileOutputStream(localFileName));
			conn = url.openConnection();
			in = conn.getInputStream();
			byte[] buffer = new byte[1024];
			int numRead;
			long numWritten = 0;
			while ((numRead = in.read(buffer)) != -1) {
				out.write(buffer, 0, numRead);
				numWritten += numRead;
			}
			System.out.println(localFileName + "\t" + numWritten);
		} catch (Exception exception) {
			exception.printStackTrace();
		} finally {
			try {
				if (in != null) {
					in.close();
				}
				if (out != null) {
					out.close();
				}
			} catch (IOException ioe) {
			}
		}
	}

	public static void download(String address) {
		int lastSlashIndex = address.lastIndexOf('/');
		if (lastSlashIndex >= 0 &&
		    lastSlashIndex < address.length() - 1) {
			download(address, address.substring(lastSlashIndex + 1));
		} else {
			System.err.println("Could not figure out local file name for " +
				address);
		}
	}

	public static void main(String[] args) {
		for (int i = 0; i < args.length; i++) {
			download(args[i]);
		}
	}
}