SOA Tips-Consider using a StaX-based parser to process huge XML datasets04 May 2007
Choose the right XML parser for your implementation. You have a choice of using either a DOM-based, SAX-based or a StaX-based XML parser.
You could choose a DOM-based parser if you need to modify the XML document structure during runtime or traverse the hierarchical XML tree multiple times since DOM provides access to the complete XML tree as it is a tree-based parser. The downside to DOM is that the entire XML document has to be loaded into memory as a hierarchical object graph which may not work if your data sets are large. Therefore, this may work for small or medium-sized XML datasets.
When you have to deal with large documents with a limited amount of nested elements and you only need to use a subset of the complete XML document at any time, you could consider using a SAX-based parser. In SAX-based parsing, the XML data is read and pieces of XML fragments of the document are pushed to application code-handlers using events. Therefore SAX is an event-based parser where the parser takes control of the code.
If you need to process huge XML datasets, consider using a StaX-based parser. In StaX-based parsing, the data is pulled from the data stream by the application at its own convenience. It allows your application code to filter, skip tags, or stop parsing at any point in time. This allows your application (not the parser) to be in total control of parsing the data. In StaX-based parsers, application code controls parsing directly by iterating over the document using a stream reader approach much like SAX. However, it only holds a small part of the XML document in memory at any point in time. Having the advantages of both DOM and SAX-based parsers makes StaX-based parsers the best choice to handle huge documents in a fast and efficient manner.