Parsing Large JSON Files using Jackson Streaming API Example

In last couple of JSON tutorials for Java programmers, we have learned how to parse JSON using JSON-Simple library, parsing JSON array to Java array using GSon, and in this tutorial we will learn how to parse a large JSON file in Java using Jackson's Streaming API. Jackson is one of the most popular JSON processing framework and provides three main model to parse and process JSON data including Streaming API, data binding and tree model. Out of these three, Streaming works at lowest level and can be used to parse huge JSON response upto even giga bytes of size. If you are familiar with XML parsing, then you know that how difficult it is to parse huge XML files with DOM parser because it fully loads the file in memory before you can process it. In case you have low memory e.g. Android devices you can't use that to parse XML. Thankfully, XML provides SAX and StAX parsers which are streaming based and can be used to process huge files without loading them completely in memory. Out of these two, StAX is even better because it allows pull based processing where client pulls data from parser instead of parser pushing data, which is the case with SAX parser. Jackson's Streaming API is similar to StAX parser. You can pull the data you want and ignore what you don't want. Though performance doesn't come without cost, using Streaming API is little difficult then using other Jackson model which provides direct mapping between Java and Jackson objects. You have to handle all JSON data by yourself while using Streaming API.




Benefits of using Jackson Streaming API

There are several advantages of using Jackson's Streaming API to parse JSON String or convert Java object to JSON, but the most important one is that its very efficient. It has least memory and processing overhead and extremely useful to parse large JSON responses, for example a JSON response containing thousands of order or list of books or list of electronic items downloaded from e-commerce sites like eBay or Amazon. Talking about other two model of Jackson API, data binding model converts JSON to and from Java object based either annotation or Java bean convention, while Tree Model provides a mutable in-memory tree representation of a JSON document, similar to DOM parser. In short, Streaming API is most powerful, has less memory and CPU overhead but tricky to use, while data binding is often most convenient, on the other hand Tree Model is most flexible. BTW, both of this model internally uses streaming API to parse JSON strings before converting it into respective models.


Library JARs and Dependency

In order to try following example, you need to download and add Jackson streaming API in your program's classpath. If you are using Maven then you can add following dependency in your pom.xml file :

<dependency>
   <groupId>org.codehaus.jackson</groupId>
   <artifactId>jackson-xc</artifactId>
   <version>1.9.12</version>
</dependency>

or just download and  add following JAR in CLASSPATH of your Java application.

C:\.m2\repository\org\codehaus\jackson\jackson-xc\1.9.12\jackson-xc-1.9.12.jar
C:\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.12\jackson-core-asl-1.9.12.jar
C:\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.12\jackson-mapper-asl-1.9.12.jar

It's often easier to manage dependency using Maven and that's why I strongly suggest to switch to Maven if you are not using it yet. You can later upgrade to newer version of Jackson library by just changing one line in Maven pom.xml file.


Parsing JSON in Java using Jackson Streaming API

How to parse large JSON File using Jackson Streaming API
This API has two main module, one fore reading JSON and other for writing JSON and in this tutorial we will learn both of them. JsonGenerator is used to write JSON while JsonParser is used to parse a JSON file. To demonstrate both reading and writing of JSON data in one program, I have created two static methods, createJSON() and parseJSON(). As name suggests first method creates a JSON file, which is then read by parseJSON() method. You can see in the code that we are dealing with quite low level, we have not created any Java object to represent content of JSON, instead we are writing and reading String, numbers and arrays.

You can get an instance of JsonGenerator from JsonFactory class by calling createJsonGenerator() method. You can also provide the encoding you are intended to use, in our case I have used "UTF-8" which is a convenient default in most cases. You can use various write() methods to write contents.  Similarly, for parsing JSON, we need to create an instance of JsonParser, which can also be obtained from JsonFactory.  We parse JSON by calling nextToken() method of JsonParser in a while loop until we reach JsonToken.END_OBJECT. Jackson API provides method to get name and value of token which you can use to identify data. Similarly while parsing JSON array, you wait until you get JsonToken.END_ARRAY identifier. Since we never load the whole file in memory, this method can be used to read large JSON files with sizes from Mega bytes to Giga bytes even with minimal memory environment e.g. in Android smartphones or Java ME enabled devices.

Here is the sample code example to read and write JSON using Jackson Streaming API :

import java.io.File;
import java.io.IOException;

import org.codehaus.jackson.JsonEncoding;
import org.codehaus.jackson.JsonFactory;
import org.codehaus.jackson.JsonGenerationException;
import org.codehaus.jackson.JsonGenerator;
import org.codehaus.jackson.JsonParser;
import org.codehaus.jackson.JsonToken;
import org.codehaus.jackson.map.JsonMappingException;

/**
* Java program to demonstrate how to use Jackson Streaming API to read and
* write JSON Strings efficiently and fast.
*
* @author Javin Paul
*/
public class JsonJacksonStreamingAPIDemo{

    public static void main(String args[]) {

        System.out.println("Creating JSON file by using Jackson Streaming API in Java");
        createJSON("jacksondemo.json");
        System.out.println("done");

        System.out.println("Parsing JSON file by using Jackson Streaming API");
        parseJSON("jacksondemo.json");
        System.out.println("done");
    }

    /*
     * This method create JSON String by using Jackson Streaming API.
     */
    public static void createJSON(String path) {
        try {
            JsonFactory jsonfactory = new JsonFactory();
            File jsonDoc = new File(path);
            JsonGenerator generator = jsonfactory.createJsonGenerator(jsonDoc, JsonEncoding.UTF8);

            generator.writeStartObject();
            generator.writeStringField("firstname", "Garrison");
            generator.writeStringField("lastname", "Paul");
            generator.writeNumberField("phone", 847332223);

            generator.writeFieldName("address");

            generator.writeStartArray();
            generator.writeString("Unit - 232");
            generator.writeString("Sofia Streat");
            generator.writeString("Mumbai");
            generator.writeEndArray();

            generator.writeEndObject();

            generator.close();

            System.out.println("JSON file created successfully");

        } catch (JsonGenerationException jge) {
            jge.printStackTrace();
        } catch (JsonMappingException jme) {
            jme.printStackTrace();
        } catch (IOException ioex) {
            ioex.printStackTrace();
        }
    }

    /*
     * This method parse JSON String by using Jackson Streaming API example.
     */
    public static void parseJSON(String filename) {
        try {
            JsonFactory jsonfactory = new JsonFactory();
            File source = new File(filename);

            JsonParser parser = jsonfactory.createJsonParser(source);

            // starting parsing of JSON String
            while (parser.nextToken() != JsonToken.END_OBJECT) {
                String token = parser.getCurrentName();

                if ("firstname".equals(token)) {
                    parser.nextToken();  //next token contains value
                    String fname = parser.getText();  //getting text field
                    System.out.println("firstname : " + fname);

                }

                if ("lastname".equals(token)) {
                    parser.nextToken();
                    String lname = parser.getText();
                    System.out.println("lastname : " + lname);

                }

                if ("phone".equals(token)) {
                    parser.nextToken();
                    int phone = parser.getIntValue();  // getting numeric field
                    System.out.println("phone : " + phone);

                }

                if ("address".equals(token)) {
                    System.out.println("address :");
                    parser.nextToken(); // next token will be '[' which means JSON array

                    // parse tokens until you find ']'
                    while (parser.nextToken() != JsonToken.END_ARRAY) {
                        System.out.println(parser.getText());
                    }
                }
            }
            parser.close();

        } catch (JsonGenerationException jge) {
            jge.printStackTrace();
        } catch (JsonMappingException jme) {
            jme.printStackTrace();
        } catch (IOException ioex) {
            ioex.printStackTrace();
        }
    }


and here is the output of our program, when you run it from Eclipse or directly from command line :

Creating JSON file by using Jackson Streaming API in Java
JSON file created successfully
done
Parsing JSON file by using Jackson Streaming API
firstname : Garrison
lastname : Paul
phone : 847332223
address :
Unit - 232
Sofia Streat
Mumbai
done


You will also see file jacksondemo.json in your project directory with following JSON String :

{
  "firstname":"Garrison",
  "lastname":"Paul",
  "phone":847332223,
   "address":["Unit - 232","Sofia Streat","Mumbai"]
}


That's all about how to use Jackson Stream API to parse JSON String and to create JSON from Java object. It's a powerful library with lots of feature but Streaming is best. I know its little bit difficult and you need to write lot of code with hard coded filed names, it is the fastest way to read a large JSON file in Java with less memory overhead. If you are dealing with normal size JSON output and you don't have a memory constraints then you can always use Jackson Data binding model to parse JSON to Java Object.

2 comments :

Ankush said...

how to validate Json against Json schema, do we have any validator to validate Json along with parsing ??

Int64 said...

How Would you do it for multiple Json in a text file ?

Post a Comment