Java StAX

Last modified on December 4th, 2014 by Joe.

XML processing is an important weapon in a programmer’s armour. It is not something new and still growing popular by each day. Android embraces it. Every emerging technology incorporates XML. Java supports XML by providing JAXP and it has multiple ways for xml processing and they are SAX, DOM, TrAX, StAX and JAXB.

In this article let us learn about what is StAX API, implementations available and how we can use it. StAX is a streaming pull-parsing java API for reading and writing XMLs and it is an alternate to SAX, DOM and TrAX. StAX API is a set of interfaces providing a framework for bi-directional parser. It provides two sets of APIs, cursor and iterator based.

StAX was carved out of JSR 173.

Important Note: StAX API is set of interfaces only and there is no implementation. There are different implementations available like stax.codehaus, sjsxp, and woodstox. Since there is no implementation with JDK, we need to choose a StAX implementation, download its jar and add it to our classpath to use StAX API for reading and writing XMLs. I am going to use Sun/Oracle’s StAX implementation that comes bundled in Java EE jar (Java EE 6 API Library: javaee-api-6.0.jar). This is available as part of glassfish server or you can download it and use with Eclipse.

SAX, DOM and StAX

SAX is a uni-directional, read only API and follows push model for reading. Read below to know about push reading. SAX and StAX are relatives compared to DOM as DOM uses a completely different approach for XML processing.

DOM parser is created using the concept of trees. XML document object model will be completely constructed as a tree and stored in memory. Then the XML document can be parsed by traversing the tree. This requires lot of memory and processing power. When working with small documents are fine with this kind of DOM processing but when you have a long document and then we will have performance issues.

StAX follows a streaming model, it can both read and write. Imagine feeding a whole XML document via a tube. At every moment one XML element will be the focus and then we move on to the next element either in forward or backward direction of our choice. This is kind of processing is not something new for us, we have seen ResultSet of JDBC API. Streaming has its advantage when we want to process large documents sequentially. Irrespective of size of the document the performance will be good. When mobile phones and apps are getting popular we also need to think of processing in terms of smaller scale compared to desktops and servers.

Example scenario for putting best use of StAX are parsing WSDL in web services, viewing relational database data stored as XML documents and in general parsing predictable document scenarios.

Pull and Push XML Parsing

When processing a XML document there are three parties involved. Party one is XML document, two is the API doing the processing and the final party is the client code which uses the API and gets data from the document.

In pull parsing client code calls the parsing API’s methods to get data and then the parser reads the XML and returns the data required. This is on demand, when client needs it it reads the data.

In push parsing the parser reads the XML document and whenever an event occurs, it pushes the respective data to the client and continues. It is like maintaining a birthday alarm. We register for alarm on a particular date, the alerter keeps running against time and alerts us when it encounters the date.

StAX Cursor API

Cursor API is very similar to JDBC ResultSet in terms of going through the XML. It always moves forward and once in goes past an element, it cannot reach back again. Main interfaces for cursor are XMLStreamReader and XMLStreamWriter and we can see about them below. Cursor API is very similar to SAX as it is unidirectional and it can read properties, shares a light weight memory foot print.

Read XML using XMLStreamReader Cursor

Following code block helps to initialize the factory:

        XMLInputFactory xmlFactory = null;

        try {
            xmlFactory = XMLInputFactory.newInstance();
            xmlFactory.setProperty(
                    XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES,
                    Boolean.TRUE);
            xmlFactory.setProperty(
                    XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES,
                    Boolean.FALSE);
            //IS_COALESCING property to true
			//gets whole text data as one event
            xmlFactory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.FALSE);
        } catch (Exception ex) {
            ex.printStackTrace();
        }

Following lines creates a stream reader.

  XMLStreamReader xmlReader = xmlFactory.createXMLStreamReader(
                    filename,
                    new FileInputStream("zoo.xml"));

Following loops over all the elements available:

  while (xmlReader.hasNext()) {

				//returns the event type
                eventType = xmlReader.next();

				//now print the element by choosing the respective method

				//returns event type for reference
				xmlReader.getEventType();

				//returns the text
				xmlReader.getText();

				//use the right methods for printing attributes, names, etc
             }

Write XML using XMLStreamWriter Cursor

XMLOutputFactory writerFactory =  XMLOutputFactory.newInstance();

XMLStreamWriter xmlWriter = writerFactory.createXMLStreamWriter(new FileWriter("zoo.xml"));

//some example write statements below
xmlWriter.writeComment("comments comments comments");
xmlWriter.writeStartDocument("utf-8","1.0");
xmlWriter.setPrefix("html", "http://www.w3.org/TR/REC-html40");

//startelement followed by text
xmlWriter.writeCharacters("Lion");
xmlWriter.writeEndElement();

StAX Iterator API

Iterator API parses the XML document and returns event objects. These are events for element, text, comment etc. This is similar to the java iterator in collections. Main interfaces in iterator api are XMLEventReader and XMLEventWriter. Base of the iteratore interface is XMLEvent. nextEvent() is the key method which returns the next event in XML stream. This is similar to next in iterator collection. When I say events, we have doubt on exactly what are those events. To clear the doubt following are events types StartDocument, StartElement, EndElement, Characters, EntityReference, ProcessingInstruction, Comment, Attribute, Namespace.

Example Events

<?xml version="1.0"?>
<Zoo xmlns="http://www.animaldept.gov">
    <Lion>
        <Type>Wild</Type>
        <Sound>Roar</Sound>
<!-- it will eat you man -->
    </Lion>
</Zoo>

StartDocument – version 1.0
StartElement – qname = Zoo:http://www.animaldept.gov
Characters – data = Wild

Above three are just example events, every element will have StartElement, EndElement fired.

Read XML using using XMLEventReader Iterator

//create a factory
XMLInputFactory factory = XMLInputFactory.newInstance();

// create xml event reader
XMLEventReader r = factory.createXMLEventReader(filename,
     new FileInputStream(filename));

//create a iterator
while(r.hasNext()) {
    XMLEvent e = r.nextEvent();
    System.out.println(e.toString());
}

Write XML using XMLEventWriter Iterator


// create xml event writer
XMLEventWriter writer =
XMLOutputFactory.newInstance().createXMLEventWriter(
System.out);

//add an event to the writer
writer.add(event);

Comments on "Java StAX"

  1. rakesh says:

    sir
    how to create mlm(multi level markrting)
    application using servlet jsp and mysql.

  2. Sayandip says:

    Thank you Joe. Awesome yet easy to comprehend; like always.

  3. Anonymous says:

    Thanks Joe. Its excellent

  4. Ashok says:

    Thanks Joe.Its very much useful

  5. Sandeep Singh says:

    Thanks Joe,
    Good read

  6. anupriya says:

    Thanks Joe, it’s a well written good piece of info..

  7. GH says:

    Hi Joe,

    Its Like Times of india i read your blog every morning. I read the older posts if ther is nothing new. Started thinkin java in the way javapapers think. Thnks.

    Expecting JAXB … !

  8. vignesh says:

    expecting JAXB !!!!

  9. Raj says:

    Many Many thank you,

    You have cleared all the basic funda of Java in very straight forward language. I am also expecting JAXB and some more design patten. Once again many thanks for providing such a nice n clear material.

  10. Joe says:

    Friends, I will soon write on JAXB.

  11. Mayur says:

    Hi Joe,

    it was nice article. i really gave me idea of what exactly stax is. many time we used to get the stax related error in web service parsing. this article is like started for me.

    it will be nice if you explain some thing about the JAXB.
    keep the good work!

    – Mayur

  12. P.G.Dinesh Karthik says:

    Hi Joe, I’m really amazed with your blog.
    I need a good training material for java completely and i have to know about the latest technologies in java, could you pl help me out in this.

  13. Ashok says:

    Hi Joe,

    it would be nice if you explain JAXB as well

    keep up the good work!

  14. Alok says:

    hi joe,
    how i read a file inside a servlet upto 1 hour
    while server is running forever.
    but file reading only one hour. please send me sone logic. sir please in every blog you give the task related some R &D purpose like above Question because we learn every day something new in core java and servlet,jsp,struts,hibernate,spring and all JEE technology…………
    this is very helpful for every jobseeker and developer…………i always read your blog .it is really very helpful

  15. konrad says:

    > Since there is no implementation with JDK, we need to choose a StAX implementation

    Java SE (tested with Java 7) comes with StAX implementation out of the box.
    However, the default impl is almost unusable with XMLInputFactory.IS_COALESCING property

    Thanks for your articles :)

  16. P S Reddy says:

    Hi Joe, Nice Article.

  17. Sandy says:

    java beginners can too understand with ease, thank you, nice article.

  18. Pradeepan says:

    Hi Joe…Really its Useful for me..I want JAXB Example please Upload

  19. Anonymous says:

    Its really awesome….Could u pls upload on JAF.

  20. […] article is an introductory tutorial for JAXB. Some time back I wrote a tutorial introducing Java StAX and after reading that lot of friends asked to write on JAXB and so this […]

  21. Sivaprakash says:

    i cannot attach Xml here ?

  22. Tharani says:

    Hi,

    How to generate XML using StAX?

  23. Sharath says:

    Thank u boss, it helps a lot…

  24. Venkat Hari says:

    Hi Joe, I always enjoy reading your articles. One of the best places to go to.

    Thanks and Regards,
    Venkat

  25. Kiran says:

    Hello Joe,

    I’m a Tibco developer and currently facing a problem with Large XML file. We tried to use the default parser(DOM) in Tibco but it is taking lot of time and sometimes it is throwing OOM exception. So after googling we found STax is a better API to solve this. Our requirement is to chunck the large file based on the threshold value provide(say for example if we have large xml with 20,000 elements,the code should split into multiple xml files each containing 1000 elements). So 1000 is the threshold value. Can you please provide a sample code using STax or guide me please.

    Thank You,
    Kiran

  26. Prasun says:

    Hi Joe,

    With respect to a JAXB based impl, how do you rate the performance of Stax. I believe, JAXB does provide an internal DOM based approach while building the binding classes. Please suggest.

    Thanks,
    Prasun

  27. amvam69 says:

    Hello Joe, I have a huge XML file want to parse it using STAX and then commit the data from my POJO files to database. Now, do we have an approach to parse files in chunks and commit. Can you please let me know on this or share a sample snippet? Thank you.

  28. mohit says:

    hi, I am parsing a rss feed of a website using STAX parser. If the title tag or description tag contains “‘” (apastrophe) it will not parse whole sentence. Can u tell me how to resolve this?

  29. laxmikanth says:

    Joe.. Article was good but you haven’t covered much.
    One complete example for each API (Iterator/Cursor)would have been better. You haven’t covered Filtering in StAX, hoping you will include this part in near future.

Comments are closed for "Java StAX".