Wednesday, September 11, 2024

Parsing Large JSON Files using Jackson Streaming API Example

In the last couple of JSON tutorials for Java programmers, we have learned how to parse JSON using JSON-Simple library, parsing JSON array to Java array using GSon, and in this tutorial, we will learn how to parse a large JSON file in Java using Jackson's Streaming API. Jackson is one of the most popular JSON processing frameworks and provides three main models to parse and process JSON data including Streaming API, data binding, and tree model. Out of these three, Streaming works at the lowest level and can be used to parse huge JSON responses up to even gigabytes of size. If you are familiar with XML parsing, then you know that how difficult it is to parse huge XML files with DOM parser because it fully loads the file in memory before you can process it.

In case you have low memory e.g. Android devices you can't use that to parse XML. Thankfully, XML provides SAX and StAX parsers which are streaming-based and can be used to process huge files without loading them completely in memory.

Out of these two, StAX is even better because it allows pull-based processing where the client pulls data from parser instead of parser pushing data, which is the case with SAX parser. Jackson's Streaming API is similar to the StAX parser. You can pull the data you want and ignore what you don't want.

Though performance doesn't come without a cost, using Streaming API is a little difficult than using other Jackson models which provides a direct mapping between Java and Jackson objects. You have to handle all JSON data by yourself while using Streaming API.

And, if you are new to JSON Parsing in Java then I highly recommend you check out Jackson library. It's a very useful, versatile, and high-performance library for parsing JSON and CSV files and I think every Java developer should know about it.

If you need a resource, check out Jackson Quick Start: JSON Serialization With Java Made Easy - a free course on Udemy to learn Jackson API basics.






Benefits of using Jackson Streaming API

There are several advantages of using Jackson's Streaming API to parse JSON String or convert Java object to JSON, but the most important one is that its very efficient.

It has least memory and processing overhead and is extremely useful to parse large JSON responses, for example a JSON response containing thousands of order or list of books or list of electronic items downloaded from e-commerce sites like eBay or Amazon. 

Talking about other two model of Jackson API, data binding model converts JSON to and from Java object based either annotation or Java bean convention, while Tree Model provides a mutable in-memory tree representation of a JSON document, similar to DOM parser. 

In short, Streaming API is most powerful, has less memory and CPU overhead but tricky to use, while data binding is often most convenient, on the other hand Tree Model is most flexible.

BTW, both of this model internally uses streaming API to parse JSON strings before converting it into respective models.


Library JARs and Dependency

In order to try following example, you need to download and add Jackson streaming API in your program's classpath. If you are using Maven then you can add following dependency in your pom.xml file :

<dependency>
   <groupId>org.codehaus.jackson</groupId>
   <artifactId>jackson-xc</artifactId>
   <version>1.9.12</version>
</dependency>

or just download and  add following JAR in CLASSPATH of your Java application.

C:\.m2\repository\org\codehaus\jackson\jackson-xc\1.9.12\jackson-xc-1.9.12.jar
C:\.m2\repository\org\codehaus\jackson\jackson-core-asl
                          \1.9.12\jackson-core-asl-1.9.12.jar
C:\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.12
                           \jackson-mapper-asl-1.9.12.jar

It's often easier to manage dependency using Maven and that's why I strongly suggest switching to Maven if you are not using it yet. You can later upgrade to a newer version of Jackson library by just changing one line in Maven pom.xml file.


Parsing JSON in Java using Jackson Streaming API

How to parse large JSON File using Jackson Streaming API
This API has two main module, one fore reading JSON and other for writing JSON and in this tutorial we will learn both of them. JsonGenerator is used to write JSON while JsonParser is used to parse a JSON file. 

To demonstrate both reading and writing of JSON data in one program, I have created two static methods, createJSON() and parseJSON()



As name suggests first method creates a JSON file, which is then read by parseJSON() method. 

You can see in the code that we are dealing with quite low level, we have not created any Java object to represent content of JSON, instead we are writing and reading String, numbers and arrays.

You can get an instance of JsonGenerator from JsonFactory class by calling createJsonGenerator() method. You can also provide the encoding you are intended to use, in our case I have used "UTF-8" which is a convenient default in most cases.

You can use various write() methods to write contents.  Similarly, for parsing JSON, we need to create an instance of JsonParser, which can also be obtained from JsonFactory.

We parse JSON by calling nextToken() method of JsonParser in a while loop until we reach JsonToken.END_OBJECT. Jackson API provides method to get name and value of token which you can use to identify data.

Similarly while parsing JSON array, you wait until you get JsonToken.END_ARRAY identifier.

Since we never load the whole file in memory, this method can be used to read large JSON files with sizes from Megabytes to Gigabytes even with minimal memory environment e.g. in Android smartphones or Java ME enabled devices.

Here is the sample code example to read and write JSON using Jackson Streaming API :

import java.io.File;
import java.io.IOException;

import org.codehaus.jackson.JsonEncoding;
import org.codehaus.jackson.JsonFactory;
import org.codehaus.jackson.JsonGenerationException;
import org.codehaus.jackson.JsonGenerator;
import org.codehaus.jackson.JsonParser;
import org.codehaus.jackson.JsonToken;
import org.codehaus.jackson.map.JsonMappingException;

/**
* Java program to demonstrate how to use Jackson Streaming API to read and
* write JSON Strings efficiently and fast.
*
* @author Javin Paul
*/
public class JsonJacksonStreamingAPIDemo{

    public static void main(String args[]) {

        System.out.println("Creating JSON file by using Jackson 
                                Streaming API in Java");
        createJSON("jacksondemo.json");
        System.out.println("done");

        System.out.println("Parsing JSON file by using Jackson 
                                Streaming API");
        parseJSON("jacksondemo.json");
        System.out.println("done");
    }

    /*
     * This method create JSON String by using Jackson Streaming API.
     */
    public static void createJSON(String path) {
        try {
            JsonFactory jsonfactory = new JsonFactory();
            File jsonDoc = new File(path);
            JsonGenerator generator = jsonfactory.createJsonGenerator(jsonDoc,
                                             JsonEncoding.UTF8);

            generator.writeStartObject();
            generator.writeStringField("firstname", "Garrison");
            generator.writeStringField("lastname", "Paul");
            generator.writeNumberField("phone", 847332223);

            generator.writeFieldName("address");

            generator.writeStartArray();
            generator.writeString("Unit - 232");
            generator.writeString("Sofia Streat");
            generator.writeString("Mumbai");
            generator.writeEndArray();

            generator.writeEndObject();

            generator.close();

            System.out.println("JSON file created successfully");

        } catch (JsonGenerationException jge) {
            jge.printStackTrace();
        } catch (JsonMappingException jme) {
            jme.printStackTrace();
        } catch (IOException ioex) {
            ioex.printStackTrace();
        }
    }

    /*
     * This method parse JSON String by using Jackson Streaming API example.
     */
    public static void parseJSON(String filename) {
        try {
            JsonFactory jsonfactory = new JsonFactory();
            File source = new File(filename);

            JsonParser parser = jsonfactory.createJsonParser(source);

            // starting parsing of JSON String
            while (parser.nextToken() != JsonToken.END_OBJECT) {
                String token = parser.getCurrentName();

                if ("firstname".equals(token)) {
                    parser.nextToken();  //next token contains value
                    String fname = parser.getText();  //getting text field
                    System.out.println("firstname : " + fname);

                }

                if ("lastname".equals(token)) {
                    parser.nextToken();
                    String lname = parser.getText();
                    System.out.println("lastname : " + lname);

                }

                if ("phone".equals(token)) {
                    parser.nextToken();
                    int phone = parser.getIntValue();  
                                  // getting numeric field
                    System.out.println("phone : " + phone);

                }

                if ("address".equals(token)) {
                    System.out.println("address :");
                    parser.nextToken(); 
                    // next token will be '[' which means JSON array

                    // parse tokens until you find ']'
                    while (parser.nextToken() != JsonToken.END_ARRAY) {
                        System.out.println(parser.getText());
                    }
                }
            }
            parser.close();

        } catch (JsonGenerationException jge) {
            jge.printStackTrace();
        } catch (JsonMappingException jme) {
            jme.printStackTrace();
        } catch (IOException ioex) {
            ioex.printStackTrace();
        }
    }


and here is the output of our program, when you run it from Eclipse or directly from the command line :

Creating JSON file by using Jackson Streaming API in Java
JSON file created successfully
done
Parsing JSON file by using Jackson Streaming API
firstname : Garrison
lastname : Paul
phone : 847332223
address :
Unit - 232
Sofia Streat
Mumbai
done


You will also see file jacksondemo.json in your project directory with the following JSON String :

{
  "firstname":"Garrison",
  "lastname":"Paul",
  "phone":847332223,
   "address":["Unit - 232","Sofia Streat","Mumbai"]
}

And, here is a nice Jackson Cheat sheet to remember the Jackson annotations which is also heavily used while parsing JSON:

Jackson annotation cheat sheet


That's all about how to use Jackson Stream API to parse JSON String and to create JSON from Java objects. It's a powerful library with lots of features but Streaming is best. I know it's a little bit difficult and you need to write a lot of code with hard-coded filed names, it is the fastest way to read a large JSON file in Java with less memory overhead. 

If you are dealing with normal size JSON output and you don't have memory constraints then you can always use Jackson Data binding model to parse JSON to Java Object.


Other JSON tutorials you may like to explore
  • How to convert a JSON  String to POJO in Java? (tutorial)
  • 3 Ways to parse JSON String in Java? (tutorial)
  • How to convert JSON array to String array in Java? (example)
  • How to convert a Map to JSON in Java? (tutorial)
  • How to use Google Protocol Buffer in Java? (tutorial)
  • How to use Gson to convert JSON to Java Object? (example)
  • 5 Books to Learn REST and RESTful Web Services (books)

P.S. - If you are looking for online training to learn how to develop RESTful Web Services in Java using the Spring framework, I suggest you joining Eugen Paraschiv's REST with Spring course. The course has various options depending upon your experience level and how much you want to learn e.g. beginner's class, intermediate class, and master class. You can join the one which suits you better, though I suggest joining the master class if you are serious about becoming an expert Java REST developer.

2 comments:

  1. how to validate Json against Json schema, do we have any validator to validate Json along with parsing ??

    ReplyDelete
  2. How Would you do it for multiple Json in a text file ?

    ReplyDelete