Gaston Dombiak

Unexpected XML parsing learnings

Blog Post created by Gaston Dombiak Champion on Feb 27, 2007

Smack uses an XML Pull Parser (XPP) to parse XMPP stanzas and build custom Packet objects. Wildfire represents XMPP stanzas as DOM objects wrapped by Packet objects. Each approach has its own pros and cons. Moreover, in Wildfire we can use different parsers to generate DOM objects.


For Wildfire 3.2 we needed to change the way we were parsing XML to work with the new, more scalable, networking layer built using MINA. We wanted to keep building DOM objects wrapped by Packet objects so our parsers were still useful. However, we needed a way to parse stanzas received in an asynchronous way. We ended up reusing a contribution made by a community member. We know that it is a custom solution that we might need to replace at some point since the parsing is not 100% efficient and some cases may potentially break the parser as was seen in Wildfire 3.2.1.


Anyway, while doing some heavy load testing on Wildfire and measuring performance we noticed that the parser we were using to generate DOM objects was becoming a serious bottleneck. At that point we were using a SAX parser (SAXReader) so we decided to try with other parsers and compare results. The other parser that we had in Wildfire was XMPPPacketReader that uses an XML Pull Parser (XPP).  To my surprise the performance improvement was substantial with the new parser: around 30% faster with the XPP parser compared to the SAX parser.


Architecturally it is obvious that an XPP parser will be much faster than a SAX parser, but since both parsers were being used to create DOM objects I initially thought  it wouldn't make much of a difference which one was used since building DOM objects was the expensive operation. Well, I was simply flat wrong and I learned it the hard way. I'm happy, though, that we found this bottleneck during our performance tests prior to release. It is yet another proof of how important it is to include QA as part of your development cycle.