Friday, July 30, 2021

How to Generate MD5 checksum for Files in Java? Example

MD5 checksums are good to verify the integrity of files and It's easy to generate MD5 checksum in Java. Java provides a couple of ways to generate the MD5 checksum for any file, you can either use java.security.MessageDigest or any open source library like Apache commons-codec or Spring. All 3 ways we have seen in our earlier article about generating the MD5 hash for String is also applicable to generate the MD5  checksum for any file. Since most of the md5() or md5Hex() method takes byte[], you can simply read bytes from InputStream or pass to these md5 methods. 

Apache commons-codec from version 1.4 also provides an overloaded method for accepting InputStream, which makes generating checksum very easy in Java. For those who are not familiar with checksum, it's a fixed-size datum generated from a block of data to detect any accidental change in data. 

This means once you create a checksum for a file, which is based on the contents of the file, any change on the file e.g. adding white space, deleting a character will result in a different checksum. 

By comparing stored checksum with current checksum, you can detect any change on File. It's good practice to provide a checksum of WAR or JAR files to support teams for production release. 

In this Java tutorial, we will learn how to create the MD5 checksum for any file in Java.




Java program to generate MD5 checksum for Files

How to generate MD5 checksum for files in JavaWhen we create an MD5 checksum for a File any further change's produce a different checksum. In this Java program we will see two ways to create MD5 checksum for a File. In first method, we have used standard Java library and MessageDigest from the security package to create an MD5 checksum. If you notice we have used update() method of MessageDigest, instead calling digest with a byte[]

This is the right way to generate MD5 checksum of a File because Fie could be very large and you might not have enough memory to read the entire file as a byte array and result in Java.lang.OutOfMemoryError: Java Heap Space. It's better to read data in parts and update MessageDigest

The second method uses Apache commons Codec to generate the MD5 checksum of a File. DigestUtils provides an overloaded method md5Hex() which can accept InputStream from version 1.4, which means you don't need to convert InputStream to String or byte array. Let's see a complete Java example to create an MD5 checksum for any file in Java.

import java.io.FileInputStream;
import java.io.IOException;
import java.math.BigInteger;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.commons.codec.digest.DigestUtils;

/**
 * Java program to generate MD5 checksum for files in Java. This Java example
 * uses core Java security package and Apache commons codec to generate MD5
 * checksum for a File.
 *
 * @author Javin Paul
 */
public class MD5Checksum {
    private static final Logger logger = Logger.getLogger(MD5Checksum.class.getName());
   
    public static void main(String args[]) {
        String file = "C:/temp/abc.txt";
      
        System.out.println("MD5 checksum for file using Java :                          "
                            + checkSum(file));
        System.out.println("MD5 checksum of file in Java using Apache commons codec:    "
                            + checkSumApacheCommons(file));

    }
  
    /*
     * Calculate checksum of a File using MD5 algorithm
     */
    public static String checkSum(String path){
        String checksum = null;
        try {
            FileInputStream fis = new FileInputStream(path);
            MessageDigest md = MessageDigest.getInstance("MD5");
          
            //Using MessageDigest update() method to provide input
            byte[] buffer = new byte[8192];
            int numOfBytesRead;
            while( (numOfBytesRead = fis.read(buffer)) > 0){
                md.update(buffer, 0, numOfBytesRead);
            }
            byte[] hash = md.digest();
            checksum = new BigInteger(1, hash).toString(16); //don't use this, truncates leading zero
        } catch (IOException ex) {
            logger.log(Level.SEVERE, null, ex);
        } catch (NoSuchAlgorithmException ex) {
            logger.log(Level.SEVERE, null, ex);
        }
          
       return checksum;
    }
  
    /*
     * From Apache commons codec 1.4 md5() and md5Hex() method accepts InputStream as well.
     * If you are using lower version of Apache commons codec than you need to convert
     * InputStream to byte array before passing it to md5() or md5Hex() method.
     */
    public static String checkSumApacheCommons(String file){
        String checksum = null;
        try {  
            checksum = DigestUtils.md5Hex(new FileInputStream(file));
        } catch (IOException ex) {
            logger.log(Level.SEVERE, null, ex);
        }
        return checksum;
    }

}

Output:
MD5 checksum for file using Java :                          cf4ab086129e7b3fe98881df2b526db4
MD5 checksum of file in Java using Apache commons codec:    cf4ab086129e7b3fe98881df2b526db4

Some programmer uses BigInteger to convert byte array to Hex String, as shown above, maybe because its looks a beautiful one-liner But it truncates leading zero, which can cause some problems. Let's run this program again by changing the file's content to 27, which produces MD5 checksum with leading zero.

MD5 checksum for file using Java :                                                    2e74f10e0327ad868d138f2b4fdd6f0
MD5 checksum of file in Java using Apache commons codec:    02e74f10e0327ad868d138f2b4fdd6f0

Now you can see output from the first method to create MD5 checksum only contains 31 characters and leading zero is missing. It's better to use a conventional way to convert byte array to Hex String rather than using this shortcut. If you really like using BigInteger, then you make up for those leading zero by using the format method of String. 

You can take advantage of fact that BigInteger only truncates leading zero and String always contains 32 characters. Here is a way to use the format method of String to produce 32 char, lowercase, hexadecimal String which is left padded with 0 :

String.format("%032x",new BigInteger(1, hash));

if you replace the toString() method of BigInteger with the format method of String, you will receive the same output from both methods.

That's all on How to generate MD5 checksum for a File in Java. As I said it's good to verify the checksum of Files before releasing it to the production environment and It's pretty easy to generate MD5 checksum in Java using Apache Commons-Codec or even Spring.

5 comments :

Anonymous said...

Nice utility class, MD5 encryption algorithm is best for checksum but not for anything else.

Anonymous said...

Is there any way I can get the hash of various files without having to specifically state the string file?

Unknown said...

while( (numOfBytesRead = fis.read(buffer)) > 0)

what is &gt here.

Thanks in advance

javin paul said...

Hello @havi, it means until you read bytes from file. This is to check whether you have reached at the end of file or not. If there is no more bytes to be read from file then the read() method will return -1 and then the loop will stop.

Unknown said...

@havish > is unicode value for greater than symbol (>). the loop runs until value is greater than 0 and stops at -1 (i.e. end of file).

Post a Comment