Saturday, September 1, 2012

How to replace escape XML special characters in Java String

How to replace XML special Characters in Java String
There are two approaches to replace XML or HTML special characters from Java String, First,  Write your own function to replace XML special characters or use any open source library which has already implemented it. Luckily there is one very common open source library which provides function to replace special characters from XML String is Apache commons lang’s StringEscapeUtils class which provide escaping for several  languages like XML, SQL and HTML. you can use StringEscapeUtils to convert XML special character in String to there escaped equivalent. I personally like to use open source code instead of reinventing the wheel to avoid any testing efforts. Even Joshua Bloach as advocated use of Open source library to leverage experience and work of other programers. If you are reading from XML file and after doing some transformation writing to another XML file , you need to take care of XML special characters present in source file. If you don’t escape XML special characters while creating XML document than various XML parsers like DOM and SAX parser will consider those XML meta consider them as XML tag in case of < or >. Even if you try to transform XML with special character using  XSLT transformation, it will complain and fail. So while generating XML documents its very important to escape XML special characters to avoid any parsing or transformation issues. In this Java XML tutorial we will see What is special characters in XML and how to escape XML characters from Java String.



What is XML and HTML special characters
escape XML special characters in Java program exampleThere are five special characters in XML String which require escaping. if you have been working with XML and Java you might be familiar with these five characters. Here is a list of XML and HTML special characters :


 & - &amp;
 < - &lt;
 > - &gt;
 " - &quot;
 ' - &apos;

Some time this special characters are also refereed as XML meta characters. For programmers who are not familiar with escaping, escaping is the process to use alternative String in order to produce literal result of special characters. for example following XML String is invalid:

<languages>Java & HTML</languages>

because & character is used to import other XML entity. In order to use & character as XML or String literal we need to use &amp;, just like shown in

below example:

<languages>Java &amp; HTML</languages>

Similarly if you want to use above five special xml characters as String literal then you need to escape those. Even while writing these post if I don’t escape these  HTML special character, they will be considered as HTML tag by HTML parser. In order to show them as it is I need to escape these XML special characters.

Code example to replace XML Special characters in String.

Here is complete code example to replace special characters in XML string. This example uses StringEscapeUtils from Apache commons to perform escaping:

import org.apache.commons.lang.StringEscapeUtils;

/**
 * Simple Java program to escape XML or HTML special characters in String.
 * There are five XML Special characters which needs to be escaped :
 *     & - &amp;
    < - &lt;
    > - &gt;
    " - &quot;
    ' - &apos;
 * @author http://javarevisited.blogspot.com
 */

public class XMLUtils {
 
 
    public static void main(String args[]) {
   
        //handling xml special character & in Java String
        String xmlWithSpecial = "Java & HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String in Java: "
                            +  StringEscapeUtils.escapeXml(xmlWithSpecial));
     
        //handling xml special character > in String on Java
        xmlWithSpecial = "Java > HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String : " + StringEscapeUtils.escapeXml(xmlWithSpecial));
     
       

        //handling xml and html special character < in String
        xmlWithSpecial = "Java < HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String: " + StringEscapeUtils.escapeXml(xmlWithSpecial));
     
       

        //handling html and xml special character " in Java
        xmlWithSpecial = "Java \" HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String: " + StringEscapeUtils.escapeXml(xmlWithSpecial));
     
        //handling xml special character ' in String from Java
        xmlWithSpecial = "Java ' HTML"; //xml String with & as special characters
        System.out.println("Original unescaped XML String: " + xmlWithSpecial);
        System.out.println("Escaped XML String: " + StringEscapeUtils.escapeXml(xmlWithSpecial));
   
    }
 
}

Output
Original unescaped XML String: Java & HTML
Escaped XML String in Java: Java &amp; HTML
Original unescaped XML String: Java > HTML
Escaped XML String : Java &gt; HTML
Original unescaped XML String: Java < HTML
Escaped XML String: Java &lt; HTML
Original unescaped XML String: Java " HTML
Escaped XML String: Java &quot; HTML
Original unescaped XML String: Java ' HTML
Escaped XML String: Java &apos; HTML

That’s all on how to escape XML Special characters on Java program. Its one of the main cause of bug while working with XML parsing and transformation in Java and  proper handling of XML and HTML special characters are required. If you are using database to store your XML than consider storing escaped xml instead of raw xml, this will ensure that every clients reads xml from database will have proper escaped XML or HTML.

Other Java and XML tutorials from Javarevisited Blog

1 comment :

faysal51 said...

Hey man, I can you help me java?
I want to create program that take some values say String and integers, and generates a document placing those values in a certain way which can be printed through a printer. Make this tutorial it will help a lot of learner.

Post a Comment