If you have done some serialization works in Java, then you know that it's not that easy. Since the default serialization mechanism is not efficient and has a host of problems, see Effective Java Item 74 to 78, it's really not a good choice to persist Java object in production. Though many of the efficiency shortcomings of default Serialization can be mitigated by using a custom serialized format, they have their own encoding and parsing overhead. Google protocol buffers, popularly known as protobuf is an alternate and faster way to serialize Java objects. It's probably the best alternative to Java serialization and useful for both data storage and data transfer over the network.
It's open-source, tested, and most importantly widely used in Google itself, which everyone knows puts a lot of emphasis on performance. It's also feature-rich and defines custom serialization formats for all data types, which means, you don't need to reinvent the wheel.
It's also very productive, as a developer you just need to define message formats in a .proto file, and the Google Protocol Buffers takes care of the rest of the work. Google also provides a protocol buffer compiler to generate source code from .proto file in Java programming language and an API to read and write messages on protobuf object.
You don't need to bother about any encoding, decoding detail, all you have to specify is your data-structure, in a Java, like format. There are more reasons to use Google protocol buffer, which we will see in the next section.
Both XML and JSON are equally featured rich, language-independent, and have lots of open-source Java libraries to take care of encoding and decoding. As you can use Jackson to parse JSON objects in Java, or you can use XML parsers like SAX or DOM to serialize data in XML format.
It's good if you are sharing data with other applications as XML is one of the most used data transfer protocol, and JSON is a close second, but they has their own problems like XML is verbose, it takes a lot of space to represent a small amount of data and XML parsing can impose a huge performance penalty on applications.
Also, traversing an XML DOM is not as easy as setting fields in a Java class, as you need to do in Google protocol buffer. JSON is less verbose and takes less space compared to XML, but still, you need to incur a performance penalty on encoding/decoding.
Also, another benefit of Google protocol buffer over JSON is that protobuf has a strict messaging format defined using .proto files. Let's see an example protobuf object to represent an Order in the .proto file.
By using a grammar defined, strict schema, we can realize several benefits over something like JSON, e.g. by just looking at the .proto file, we know field names, which fields are required and which are optional, and more importantly data type of different fields. Google Protocol buffer, also allows you to compile .proto files into multiple target languages like Java, C++, or Python.
It's open-source, tested, and most importantly widely used in Google itself, which everyone knows puts a lot of emphasis on performance. It's also feature-rich and defines custom serialization formats for all data types, which means, you don't need to reinvent the wheel.
It's also very productive, as a developer you just need to define message formats in a .proto file, and the Google Protocol Buffers takes care of the rest of the work. Google also provides a protocol buffer compiler to generate source code from .proto file in Java programming language and an API to read and write messages on protobuf object.
You don't need to bother about any encoding, decoding detail, all you have to specify is your data-structure, in a Java, like format. There are more reasons to use Google protocol buffer, which we will see in the next section.
Google Protocol Buffer vs Java Serialization vs XML vs JSON
You can't ignore protobuf if you care for performance. I agree that there is a lot of ways to serialize data including JSON, XML, and your own ad-hoc format, but they all have some kind of serious limitation when it comes to storing non-trivial objects.Both XML and JSON are equally featured rich, language-independent, and have lots of open-source Java libraries to take care of encoding and decoding. As you can use Jackson to parse JSON objects in Java, or you can use XML parsers like SAX or DOM to serialize data in XML format.
It's good if you are sharing data with other applications as XML is one of the most used data transfer protocol, and JSON is a close second, but they has their own problems like XML is verbose, it takes a lot of space to represent a small amount of data and XML parsing can impose a huge performance penalty on applications.
Also, traversing an XML DOM is not as easy as setting fields in a Java class, as you need to do in Google protocol buffer. JSON is less verbose and takes less space compared to XML, but still, you need to incur a performance penalty on encoding/decoding.
Also, another benefit of Google protocol buffer over JSON is that protobuf has a strict messaging format defined using .proto files. Let's see an example protobuf object to represent an Order in the .proto file.
message Order { required int64 order_id = 1; required string symbol = 2; required double quantity = 3; required double price = 4; optional string text = 5; }
By using a grammar defined, strict schema, we can realize several benefits over something like JSON, e.g. by just looking at the .proto file, we know field names, which fields are required and which are optional, and more importantly data type of different fields. Google Protocol buffer, also allows you to compile .proto files into multiple target languages like Java, C++, or Python.
One of the rawest but good approaches for performance-sensitive applications is to invent their own ad-hoc way to encode data structures. This is rather simple and flexible but not good from the maintenance point of view, as you need to write your own encoding and decoding code, which is sort of reinventing the wheel.
In order to make it as feature-rich as Google protobuf, you need to spend a considerable amount of time. So this approach only works best for the simplest of data structure, and not productive for complex objects.
If performance is not your concern then you can still use default serialization protocol built-in Java itself, but as mentioned in Effective Java, it got of problems. Also, it’s not good if you are sharing data between two applications that are not written in Java like a native application written in C++.
Google protocol buffer provides a midway solution, they are not as space intensive as XML and much better than Java serialization, in fact, they are much more flexible and efficient. With Google protocol buffer, all you need to do is write a .proto description of the object you wish to store.
From that, the protocol buffer compiler creates a Java class that implements automatic encoding and parsing of the buffer data with an efficient binary format. This generated class, known as protobuf object, provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit.
Another big plus is that google protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format, though you need to follow certain rules to maintain backward and forward compatibility.
You can see protobuf has some serious things to offer, and it's certainly found its place in financial data processing and FinTech. Though XML and JSON have their wide use and I still recommend them depending upon your scenario, as JSON is more suitable for web development, where one end is Java and the other is a browser that runs JavaScript.
Protocol buffer also has limited language support than XML or JSON, officially good to provide compilers for C++, Java, and Python but there are third-party add-ons for Ruby and other programming languages, on the other hand, JSON has almost ubiquitous language support.
In short, XML is good to interact with the legacy systems and using web service, but for high-performance applications, which are using their own ad-hoc way for persisting data, google protocol buffer is a good choice.
Google protocol buffer or protobuf also has miscellaneous utilities that can be useful for you as a protobuf developer, there is a plugin for Eclipse, NetBeans IDE, and IntelliJ IDEA to work with protocol buffer, which provides syntax highlighting, content assist and automatic generation of numeric types, as you type. There is also a Wireshark/Ethereal packet sniffer plugin to monitor protobuf traffic.
That's all on this introduction of Google Protocol Buffer, in the next article we will see How to use google protocol buffer to encode Java objects.
Further Reading
If you like this article and interested to know more about Serialization in Java, I recommend you to check some of my earlier post on the same topic :- The Complete Java Developer RoadMap (map)
- Top 10 Java Serialization Interview Questions and Answers (list)
- Difference between Serializable and Externalizable in Java? (answer)
- Why use SerialVersionUID in Java? (answer)
- How to work with a transient variable in Java? (answer)
- What is the difference between the transient and volatile variables in Java? (answer)
- How to serialize an object in Java? (answer)
- Google's Official Guide to Protocol Buffer 3 (guide)
- 10 Things Java developer should learn this year (article)
- Top 10 Frameworks Full stack Java Developer can learn (framework)
- 20 Java Libraries and APIs every Java Programmer should learn (libraries)
- My favorite free courses to learn Java in-depth (courses)
- 10 Free Courses to learn Spring Boot for beginners (courses)
- 10 Tools Every Java Developer should learn (tools)
A very similar framework to protobuf is Apache Thrift. The Downside: it is slightly less efficient.
ReplyDeleteOn the Upsides it has a much broader language support, service definitions with different transport layers and offeres different protocols (Binary, JSON and XML). See http://thrift.apache.org
I also have a similar question. Why not use (or which are the drawbacks for using) Apache's Avro (very similar to Thrift (http://avro.apache.org/)?
ReplyDeleteYou can also define your grammar defined, strict schema (a similar .avpr file for your 'protocol'), it is cross platform, you can easily (de)serialize objects based on this shema.
What are the downsides regarding perfomance?
This could also be a very interesting post I think :)