Tuesday, May 16, 2023

How to replace escape XML special characters in Java String - Example

How to replace XML special Characters in Java String
There are two approaches to replace XML or HTML special characters from Java String, First,  Write your own function to replace XML special characters or use any open source library which has already implemented it. Luckily there is one very common open-source library that provides a function to replace special characters from XML String is Apache commons lang’s StringEscapeUtils class which provides escaping for several languages like XML, SQL, and HTML. you can use StringEscapeUtils to convert XML special characters in String to their escaped equivalent. I personally like to use open-source code instead of reinventing the wheel to avoid any testing efforts.


Even Joshua Bloch advocated the use of Open source library to leverage the experience and work of other programmers. If you are reading from XML file and after doing some transformation writing to another XML file, you need to take care of XML special characters present in the source file. 

If you don’t escape XML special characters while creating XML document than various XML parsers like DOM and SAX parser will consider those XML meta consider them as XML tag in case of < or >. 

Even if you try to transform XML with a special character using  XSLT transformation, it will complain and fail. So while generating XML documents it's very important to escape XML special characters to avoid any parsing or transformation issues. In this Java XML tutorial, we will see What is special characters in XML and how to escape XML characters from Java String.

How to replace escape XML special characters in Java String - Example



What is XML and HTML special characters



There are five special characters in XML String which require escaping. if you have been working with XML and Java you might be familiar with these five characters. Here is a list of XML and HTML special characters :
  1.  & - &amp;
  2.  < - &lt;
  3.  > - &gt;
  4.  " - &quot;
  5.  ' - &apos;

escape XML special characters in Java program exampleSometimes these special characters are also referred as XML metacharacters. For programmers who are not familiar with escaping, escaping is the process to use alternative String in order to produce the literal result of special characters. for example, following XML String is invalid:

<languages>Java & HTML</languages>

because & character is used to import other XML entity. In order to use & character as XML or String literal we need to use &amp;, just like shown in

below example:

<languages>Java &amp; HTML</languages>

Similarly, if you want to use above five special XML characters as String literal then you need to escape those. Even while writing these posts if I don’t escape these  HTML special characters, they will be considered as HTML tags by HTML parser. In order to show them as it is, I need to escape these XML special characters.


Code example to replace XML Special characters in String.

Here is a complete code example to replace special characters in an XML string. This example uses StringEscapeUtils from Apache commons to perform escaping:

import org.apache.commons.lang.StringEscapeUtils;

/**
 * Simple Java program to escape XML or HTML special characters in String.
 * There are five XML Special characters which needs to be escaped :
 *     & - &amp;
    < - &lt;
    > - &gt;
    " - &quot;
    ' - &apos;
 * @author http://javarevisited.blogspot.com
 */

public class XMLUtils {
 
 
    public static void main(String args[]) {
   
        //handling xml special character & in Java String
        String xmlWithSpecial = "Java & HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String in Java: "
                            +  StringEscapeUtils.escapeXml(xmlWithSpecial));
     
        //handling xml special character > in String on Java
        xmlWithSpecial = "Java > HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String : " + StringEscapeUtils.escapeXml(xmlWithSpecial));
     
       

        //handling xml and html special character < in String
        xmlWithSpecial = "Java < HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String: " + StringEscapeUtils.escapeXml(xmlWithSpecial));
     
       

        //handling html and xml special character " in Java
        xmlWithSpecial = "Java \" HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String: " + StringEscapeUtils.escapeXml(xmlWithSpecial));
     
        //handling xml special character ' in String from Java
        xmlWithSpecial = "Java ' HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String: " + StringEscapeUtils.escapeXml(xmlWithSpecial));
   
    }
 
}

Output
Original unescaped XML String: Java & HTML
Escaped XML String in Java: Java &amp; HTML
Original unescaped XML String: Java > HTML
Escaped XML String : Java &gt; HTML
Original unescaped XML String: Java < HTML
Escaped XML String: Java &lt; HTML
Original unescaped XML String: Java " HTML
Escaped XML String: Java &quot; HTML
Original unescaped XML String: Java ' HTML
Escaped XML String: Java &apos; HTML

That’s all on how to escape XML Special characters on Java program. It's one of the main causes of bug while working with XML parsing and transformation in Java and proper handling of XML and HTML special characters are required. If you are using a database to store your XML then consider storing escaped XML instead of raw XML, this will ensure that every client read XML from the database will have proper escaped XML or HTML.


Other Java and XML tutorials from Javarevisited Blog

3 comments :

faysal51 said...

Hey man, I can you help me java?
I want to create program that take some values say String and integers, and generates a document placing those values in a certain way which can be printed through a printer. Make this tutorial it will help a lot of learner.

Anonymous said...

This does not work as desired if your input is a valid xml file because StringEscapeUtils escapes the <> tags as well/. for e.g.
14/02/2017 becomes
<MvtDate type="D">14/02/2017 </MvtDate>

Gokul said...

hello actually i want this to happen the other way round like i want the output to be just & i am using dom parse method to create a xml file

Post a Comment