A story about XML parsing on Android

Software Craftsman at Novoda passionate about Android Development, Open Source and Generative Arts.

I remember one of the first tasks I paired on when I first joined Novoda. We were rewriting an Android legacy application but the rewrite conserved some of the old services. In this post I will describe the journey we took to deep dive, evaluate and craft this code to learn and develop as a team.

When rewriting this legacy application, one of these services was a XML parser for Atom feeds written using Simple-Framework, an XML parser which provides a powerful and easy to use mechanism, based on annotations, which facilitates fast development. With that (legacy) parser already written we had working XML parsing for complex feeds, life was easy and beautiful... and we completely rewrote it. True we did.

Yes I know, your reaction at this point is probably the same I had when I got assigned to this task. I remember my first thinking being like:

These guys are crazy, this stuff works and they want to rewrite it completely.

The following sentence by Donald Knuth1 came to my mind:

We should forget about small efficiencies, say about 97% of the time: premature optimisation is the root of all evil.

And I couldn't stop thinking:

Why? Why are they doing such a thing? Such overkill! There must be a reason for it.

Getting to the heart

I am curious by nature and I won't do something without knowing the reasoning behind it. I had to know, I had to ask why we decided to do it, get to the heart of the matter and I decided to ask one of the experienced craftsman in the team. By that time he had already worked on the application for a while, and his pragmatism and good practices were well known within our team.

Well Juan,

he said,

Of course this is not a premature decision, obviously there is a reasoning for this change, otherwise we would be wasting precious time.

Our application is not as fast as we want; we are trying to achieve the best performance possible and decided to run benchmarking to analyse what the reason was. We discovered that XML is a bottleneck in our app.

That was true, I noticed that the app was taking some time to display the content but to be honest, maybe because of ignorance, maybe because of being new in the team, I thought that "it was ok".

Numbers don't lie

He offered to sit-down together with me and my pair for this task and try to replicate the original benchmarking. We did it and by doing so we had proof.

The numbers don't lie. Do you see it? If the XML feed takes this long to be parsed there is no data, and during that time any of the other processes that rely on it cannot do their job. It is a bottleneck.

The reasoning behind the long time taken to parse the feeds arose as soon as we took a deeper look into how the Simple-Framework 3rd party library works. Firstly Simple-Framework is a DOM2 parser, with some of the performance penalties that this implies.

For us it was doing a lot of unnecessary work, therefore slowing down the app. It was not only the abuse of reflection, but also the fact that it will do some processing for every single node in the XML; even if the element being represented was not relevant to us. And we had many of these.

It was decided. Me and my pair had to find a replacement! And in order to do that we had to be craftsmen.

Getting the job done

In order to select a replacement for Simple-Framework we applied a bit of research on available options, out of our research we decided to pick two potential candidates; another DOM parser and a SAX3 parser. As a DOM parser we picked Jackson Dataformat XML and as a SAX parser we picked Simple Easy Xml Parser.

We had our candidates, Simple-Framework, SEXP and Jackson. The next natural step was to evaluate them. We wanted to be fair and give each an equal opportunity, so we established two necessary conditions for the tests to be performed.

  1. Each parser will be tested for the same data set.
  2. The parsing operations must be isolated from any other processes. Our test had to just parse, no other operations were allowed.

This seemed to be a perfect case to apply micro-benchmarking4. Caliper is Google's micro-benchmarking tool that was perfect for the job. It is easy to use and has the capabilities we needed. For the comparison we made the three candidates parse a XML structure composed of ~100 entries that where neither simple in form nor excessively complex.

The tests are completed! Let's analyse the results.

The results5 were revealing, SEXP was performing on average 2.5x faster than Jackson and 1.7x faster than Simple-Framework, and not only that, it also showed SEXP had an incredibly lower memory footprint, reducing ~80% the number of object allocations compared to Simple-Framework and ~30% the number of object allocations vs Jackson. SEXP was also reducing ~50% the amount of memory required vs both of its opponents.

So we had a candidate. SEXP turned to be a very Android friendly XML parser which not only performs fast but is also respectful with the ecosystem of services it will be living with given its low memory footprint.

Next steps

The next steps were to materialise the replacement. It is beyond the scope of this tale to detail how it was done, but in short details I can say that it implied improving the code coverage by revisiting our integration tests, identify any refactoring previous to the replacement that could facilitate work, and then attacking the code to make it effective.

Once this was done we were able to appreciate the real performance winning in our app. It was amazingly better.

The content in all of our activities were displayed earlier, in average SEXP shown to be 2.08x faster than Simple-Framework for the target app.

Moral

A few lessons I learn from this pairing exercise. The most valuable is about not doing job that it is not required to achieve the goal. We could have tried to replace the parser with the first one we find, but that would have been a very silly thing as we didn't know before hand, which parser could have given us a better results. Instead of doing it we selected our candidates, executed benchmarking on static test data, and selected a winner.

The second lesson is about pair programming. It was very valuable to find the time to sit down with more experienced members in the team. During the conversations all of us acquired a higher level vision of the problem we were handling, identifying the underlying problems and determining potential solutions to them.

The last but not least is about knowing your app, and knowing your tools. In our case we where processing massive feeds from which we didn't need all the information they contained. In this case a parser which offered to us a finer granularity when deciding the parsing flow, the parsing events to be handled and how to make sense of the data they contained was essential to achieve the performance required.

And remember, even if the 97% of the time is premature optimisation, Donald Knuth also pointed out that:

Yet we should not pass up our opportunities in that critical 3%.

  1. https://en.wikipedia.org/wiki/Donald_Knuth

  2. A DOM parser reads the full XML into memory as a tree structure and then, for every node, tries to find (if using annotations via reflection) the best matching class for it.

  3. In contradiction to a DOM parser, SAX parsers scan the document and emit events one by one as the data tree is examined. The developer has to handle those events to make sense of the data they contain.

  4. Also known as component-benchmarking, is a benchmark type which core routine consists of a relatively small and specific piece of code.

  5. More information about this particular use case for Caliper can be found here.

About Novoda

We plan, design, and develop the world’s most desirable Android products. Our team’s expertise helps brands like Sony, Motorola, Tesco, Channel4, BBC, and News Corp build fully customized Android devices or simply make their mobile experiences the best on the market. Since 2008, our full in-house teams work from London, Liverpool, Berlin, Barcelona, and NYC.

Let’s get in contact