Tuesday, June 11, 2013

How to Generate MD5 checksum for Files in Java

MD5 checksums are good to verify integrity of files and It's easy to generate MD5 checksum in Java. Java provides couple of ways to generate MD5 checksum for any file, you can either use java.security.MessageDigest or any open source library like Apache commons codec or Spring. All 3 ways we have seen in our earlier article about generating MD5 hash for String is also applicable to generate MD5  checksum for any file. Since most of md5() or md5Hex() method takes byte[], you can simply read bytes from InputStream or pass to these md5 methods. Apache commons codec from version 1.4 also provides an overloaded method to accept InputStream, which makes generating checksum very easy in Java. For those who are not familiar with checksum, it's a fixed size datum generated from a block of data to detect any accidental change in data. Which means once you create checksum for a file, which is based on contents of file, any change on file e.g. adding white space, deleting a character will result in different checksum. By comparing stored checksum with current checksum, you can detect any change on File. It's good practice to provide checksum of WAR or JAR files to support teams for production release. In this Java tutorial we will learn how to create MD5 checksum for any file in Java.


Java program to generate MD5 checksum for Files

How to generate MD5 checksum for files in JavaWhen we create MD5 checksum for a File any further change's produce a different checksum. In this Java program we will see two ways to create MD5 checksum for a File. In first method we have used standard Java library and MessageDigest from security package to create MD5 checksum. If you notice we have used update() method of MessageDigest, instead calling digest with a byte[]. This is a right way to generate MD5 checksum of a File because Fie could be very large and you might not have enough memory to read entire file as byte array and result in Java.lang.OutOfMemoryError: Java Heap Space. It's better to read data in parts and update MessageDigest. Second method uses Apache commons Codec to generate MD5 checksum of a File. DigestUtils provides overloaded method md5Hex() which can accept InputStream from version 1.4, which means you don't need to convert InputStream to String or byte array. Let's see complete Java example to create MD5 checksum for any file in Java.

import java.io.FileInputStream;
import java.io.IOException;
import java.math.BigInteger;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.commons.codec.digest.DigestUtils;

/**
 * Java program to generate MD5 checksum for files in Java. This Java example
 * uses core Java security package and Apache commons codec to generate MD5
 * checksum for a File.
 *
 * @author Javin Paul
 */
public class MD5Checksum {
    private static final Logger logger = Logger.getLogger(MD5Checksum.class.getName());
   
    public static void main(String args[]) {
        String file = "C:/temp/abc.txt";
      
        System.out.println("MD5 checksum for file using Java :                          "
                            + checkSum(file));
        System.out.println("MD5 checksum of file in Java using Apache commons codec:    "
                            + checkSumApacheCommons(file));

    }
  
    /*
     * Calculate checksum of a File using MD5 algorithm
     */
    public static String checkSum(String path){
        String checksum = null;
        try {
            FileInputStream fis = new FileInputStream(path);
            MessageDigest md = MessageDigest.getInstance("MD5");
          
            //Using MessageDigest update() method to provide input
            byte[] buffer = new byte[8192];
            int numOfBytesRead;
            while( (numOfBytesRead = fis.read(buffer)) > 0){
                md.update(buffer, 0, numOfBytesRead);
            }
            byte[] hash = md.digest();
            checksum = new BigInteger(1, hash).toString(16); //don't use this, truncates leading zero
        } catch (IOException ex) {
            logger.log(Level.SEVERE, null, ex);
        } catch (NoSuchAlgorithmException ex) {
            logger.log(Level.SEVERE, null, ex);
        }
          
       return checksum;
    }
  
    /*
     * From Apache commons codec 1.4 md5() and md5Hex() method accepts InputStream as well.
     * If you are using lower version of Apache commons codec than you need to convert
     * InputStream to byte array before passing it to md5() or md5Hex() method.
     */
    public static String checkSumApacheCommons(String file){
        String checksum = null;
        try {  
            checksum = DigestUtils.md5Hex(new FileInputStream(file));
        } catch (IOException ex) {
            logger.log(Level.SEVERE, null, ex);
        }
        return checksum;
    }

}

Output:
MD5 checksum for file using Java :                          cf4ab086129e7b3fe98881df2b526db4
MD5 checksum of file in Java using Apache commons codec:    cf4ab086129e7b3fe98881df2b526db4

Some programmer uses BigInteger to convert byte array to Hex String, as shown above, may be because its looks a beautiful one liner But it truncates leading zero, which can cause some problems. Let's run this program again with by changing file's content to 27, which produces MD5 checksum with leading zero.

MD5 checksum for file using Java :                                                    2e74f10e0327ad868d138f2b4fdd6f0
MD5 checksum of file in Java using Apache commons codec:    02e74f10e0327ad868d138f2b4fdd6f0

Now you can see output from first method to create MD5 checksum only contains 31 characters and leading zero is missing. It's better to use conventional way to convert byte array to Hex String rather that using this shortcut. If you really like using BigInteger, than you make up for those leading zero by using format method of String. You can take advantage of fact that BigInteger only truncates leading zero and String always contains 32 characters. Here is a way to use format method of String to produce 32 char, lowercase, hexadecimal String which is left padded with 0 :

String.format("%032x",new BigInteger(1, hash));

if you replace toString() method of BigInteger with format method of String, you will receive same output from both methods.

That's all on How to generate MD5 checksum for a File in Java. As I said it's good to verify checksum of Files before releasing it to production environment and It's pretty easy to generate MD5 checksum in Java using Apache Commons Codec or even Spring.

1 comment :

Anonymous said...

Nice utility class, MD5 encryption algorithm is best for checksum but not for anything else.

Post a Comment