Garbage Collection: Theory and Practice

Android’s VM relies upon an automatic memory management scheme; this cornerstone of the programming paradigm on which Java is based on has been a part of the Android ecosystem since its inception. For anybody unfamiliar with the concept, a quick explanation is that a programmer is neither in charge of having to either manually to reserve (allocate) physical memory nor freeing this memory up by himself once he is done with it. This is an important distinction to lower level programming languages where manual memory management is the norm. Of course, the advantage of automatic management is that it saves the developer from having to worry about memory management. The disadvantage is that the developer also no longer has control, and is at the whim of the system to correctly manage things in an optimal manner.

Android and Dalvik have traditionally suffered a lot from Dalvik’s garbage collection (GC) mechanism. Each time an application needed memory to be allocated and the heap (a space of memory dedicated to that app) would not be able to accommodate that allocation, the GC would fire up.

A GC’s job is to traverse the heap, enumerate all objects allocated by the application, mark all reachable objects (meaning, mark all objects that are still in use), and to free up whatever is left remaining.

In Dalvik’s case, this process resulted in two pauses; one during the enumeration phase, and one during the marking phase. A pause in this sense is that all code execution is halted for the totality of the application on all its threads. If the pauses are too big, this would result in dropped frames in the rendering of the application, which in turn resulted in jank and stuttering in the user experience.

Google claims the average duration of these pauses went up to 54ms on a Nexus 5 device, which results in a minimum of 4 dropped frames on average each time the GC would activate.

In my own experience and investigating, this number can blow up dramatically depending upon the application. For example, the official FIFA app is a poster child case in which garbage collection can run wild;

07-01 15:56:14.275: D/dalvikvm(30615): GC_FOR_ALLOC freed 4442K, 25% free 20183K/26856K, paused 24ms, total 24ms

07-01 15:56:16.785: I/dalvikvm-heap(30615): Grow heap (frag case) to 38.179MB for 8294416-byte allocation

07-01 15:56:17.225: I/dalvikvm-heap(30615): Grow heap (frag case) to 48.279MB for 7361296-byte allocation

07-01 15:56:17.625: I/Choreographer(30615): Skipped 35 frames! The application may be doing too much work on its main thread.

07-01 15:56:19.035: D/dalvikvm(30615): GC_CONCURRENT freed 35838K, 43% free 51351K/89052K, paused 3ms+5ms, total 106ms

07-01 15:56:19.035: D/dalvikvm(30615): WAIT_FOR_CONCURRENT_GC blocked 96ms

07-01 15:56:19.815: D/dalvikvm(30615): GC_CONCURRENT freed 7078K, 42% free 52464K/89052K, paused 14ms+4ms, total 96ms

07-01 15:56:19.815: D/dalvikvm(30615): WAIT_FOR_CONCURRENT_GC blocked 74ms

07-01 15:56:20.035: I/Choreographer(30615): Skipped 141 frames! The application may be doing too much work on its main thread.

07-01 15:56:20.275: D/dalvikvm(30615): GC_FOR_ALLOC freed 4774K, 45% free 49801K/89052K, paused 168ms, total 168ms

07-01 15:56:20.295: I/dalvikvm-heap(30615): Grow heap (frag case) to 56.900MB for 4665616-byte allocation

07-01 15:56:21.315: D/dalvikvm(30615): GC_FOR_ALLOC freed 1359K, 42% free 55045K/93612K, paused 95ms, total 95ms

07-01 15:56:21.965: D/dalvikvm(30615): GC_CONCURRENT freed 6376K, 40% free 56861K/93612K, paused 16ms+8ms, total 126ms

07-01 15:56:21.965: D/dalvikvm(30615): WAIT_FOR_CONCURRENT_GC blocked 111ms

07-01 15:56:21.965: D/dalvikvm(30615): WAIT_FOR_CONCURRENT_GC blocked 97ms

07-01 15:56:22.085: I/Choreographer(30615): Skipped 38 frames! The application may be doing too much work on its main thread.

07-01 15:56:22.195: D/dalvikvm(30615): GC_FOR_ALLOC freed 1539K, 40% free 56833K/93612K, paused 87ms, total 87ms

07-01 15:56:22.195: I/dalvikvm-heap(30615): Grow heap (frag case) to 60.588MB for 1331732-byte allocation

07-01 15:56:22.475: D/dalvikvm(30615): GC_FOR_ALLOC freed 308K, 39% free 59497K/96216K, paused 84ms, total 84ms

07-01 15:56:22.815: D/dalvikvm(30615): GC_FOR_ALLOC freed 287K, 38% free 60878K/97516K, paused 95ms, total 95ms

Above is a log extract of the first few seconds after launching the application. The garbage collector was invoked for a total of 9 times, causing the application to freeze for a total of 603ms, and a total of 214 dropped frames. Most of the pauses are caused by requests for memory allocation, as described by the “GC_FOR_ALLOC” tag in the log entries.

What ART promises and provides is a huge revamp of the whole garbage collection system, to virtually eliminate cases as the above. Here’s a comparison of the same situation under ART:

07-01 16:00:44.531: I/art(198): Explicit concurrent mark sweep GC freed 700(30KB) AllocSpace objects, 0(0B) LOS objects, 792% free, 18MB/21MB, paused 186us total 12.763ms

07-01 16:00:44.545: I/art(198): Explicit concurrent mark sweep GC freed 7(240B) AllocSpace objects, 0(0B) LOS objects, 792% free, 18MB/21MB, paused 198us total 9.465ms

07-01 16:00:44.554: I/art(198): Explicit concurrent mark sweep GC freed 5(160B) AllocSpace objects, 0(0B) LOS objects, 792% free, 18MB/21MB, paused 224us total 9.045ms

07-01 16:00:44.690: I/art(801): Explicit concurrent mark sweep GC freed 65595(3MB) AllocSpace objects, 9(4MB) LOS objects, 810% free, 38MB/58MB, paused 1.195ms total 87.219ms

07-01 16:00:46.517: I/art(29197): Background partial concurrent mark sweep GC freed 74626(3MB) AllocSpace objects, 39(4MB) LOS objects, 1496% free, 25MB/32MB, paused 4.422ms total 1.371747s

07-01 16:00:48.534: I/Choreographer(29197): Skipped 30 frames! The application may be doing too much work on its main thread.

07-01 16:00:48.566: I/art(29197): Background sticky concurrent mark sweep GC freed 70319(3MB) AllocSpace objects, 59(5MB) LOS objects, 825% free, 49MB/56MB, paused 6.139ms total 52.868ms

07-01 16:00:49.282: I/Choreographer(29197): Skipped 33 frames! The application may be doing too much work on its main thread.

07-01 16:00:49.652: I/art(1287): Heap transition to ProcessStateJankImperceptible took 45.636146ms saved at least 723KB

07-01 16:00:49.660: I/art(1256): Heap transition to ProcessStateJankImperceptible took 52.650677ms saved at least 966KB

The difference between ART and Dalvik couldn’t be bigger, as the new runtime manages to only pause for a total of 12.364ms over 4 foreground GC calls and two background GC calls. The heap did not need to grow at all during the application launch, in comparison to Dalvik’s 4 increases. The number of frames dropped diminished to 63.

Obviously this is a worst-case scenario of a badly developed application, as the application even in ART’s case still managed to drop a little over a second of frames, but bad programming practices such as overloading the UI thread is something that Android has to deal with on a regular basis.

ART offloads some of the work that is usually done by the garbage collector to the application itself, this eliminates the need for the execution pause during the enumeration phase. The second pause is vastly reduced by trying to do the work before the pause - a technique called packard pre-cleaning is applied, and the pause itself is reserved to a simply checking and validation of the work done. Google promises that they have managed to reduce these pauses to an average of 3ms, a good order of magnitude lower than presently achieved by the Dalvik GC.

The introduction of a special “Large Object Space” which is separate of the main heap, but still residing in the application memory, is destined to facilitate better handling of big objects such as bitmaps. These large primitive objects would cause great problems in terms of fragmenting the heap, causing more need for a GC invocation in case a new object needed to be allocated, but no continuous memory region big enough for it would be available. The frequency of GC invocations is thus greatly reduced due to less fragmentation of the memory heap and more intelligent allocation and elimination by the runtime.

Another good example is the launching of the Hangouts application; In Dalvik we see several GC pauses due to allocation calls:

07-01 06:37:13.481: D/dalvikvm(7403): GC_EXPLICIT freed 2315K, 46% free 18483K/34016K, paused 3ms+4ms, total 40ms

07-01 06:37:13.901: D/dalvikvm(9871): GC_CONCURRENT freed 3779K, 22% free 21193K/26856K, paused 3ms+3ms, total 36ms

07-01 06:37:14.041: D/dalvikvm(9871): GC_FOR_ALLOC freed 368K, 21% free 21451K/26856K, paused 25ms, total 25ms

07-01 06:37:14.041: I/dalvikvm-heap(9871): Grow heap (frag case) to 24.907MB for 147472-byte allocation

07-01 06:37:14.071: D/dalvikvm(9871): GC_FOR_ALLOC freed 4K, 20% free 22167K/27596K, paused 25ms, total 25ms

07-01 06:37:14.111: D/dalvikvm(9871): GC_FOR_ALLOC freed 9K, 19% free 23892K/29372K, paused 27ms, total 28ms

We have a good extract of all GC cases in the above log. The explicit and concurrent GC calls are the general cleanup and maintenance calls of the GC. The for_alloc calls are cases where the memory allocator is trying to allocate something but it doesn’t fit in the heap, and a GC is launched to try to make space. In the middle we see that the heap was expanded due to fragmentation and not being able to hold a larger object. The total “dead time” ends up at 90ms. In comparison, here’s the current ART extract in the L preview:

07-01 06:35:19.718: I/art(10844): Heap transition to ProcessStateJankPerceptible took 17.989063ms saved at least -138KB

07-01 06:35:24.171: I/art(1256): Heap transition to ProcessStateJankImperceptible took 42.936250ms saved at least 258KB

07-01 06:35:24.806: I/art(801): Explicit concurrent mark sweep GC freed 85790(3MB) AllocSpace objects, 4(10MB) LOS objects, 850% free, 35MB/56MB, paused 961us total 83.110ms

We’re not quite sure on what the heap transition lines represent, but we’ll assume they’re the heap resizing mechanisms. The only GC call is after the app is launched already and comes at a minuscule 961µS. We don’t see any of the previous calls to the GC. What is interesting, is the LOS statistics. We see that we have 4 large objects at 10MB in the LOS, allocation that would have otherwise been resided in heap space now totally foregoes it, and with that, avoids the repeated for_alloc GC calls and heap fragmentation that plagued Dalvik.

The memory allocation system itself was also revamped. While ART itself provided an about 25% boost over Dalvik, Google was not happy with this and introduced a new memory allocator replacing the currently used “malloc” allocator.

This new allocator, “rosalloc” or Runs-of-Slots-Allocator, was developed in mind for the usage cases of multithreaded Java applications. The new allocator has a more fine-grained locking mechanisms which can lock on individual bins of objects instead of the whole memory allocation itself. Small objects in thread local regions are able to completely forego locking altogether.

The result is a huge improvement in allocation speed, resulting in up to a 10x gain.

The garbage collection algorithms themselves have been reworked to improve the user experience and to avoid interruptions of the application. These algorithms are still a work in progress and Google has only recently introduced a new dedicated algorithm, the “Moving garbage collector” whole sole purpose is to defragment the application heap while applications are in the background.

Introduction and Architecture 64-Bit Support and Conclusion
Comments Locked

136 Comments

View All Comments

  • Krysto - Thursday, July 3, 2014 - link

    I thought it was clear that the ART in L is NOT the one in KitKat, and has been revamped quite a bit. The final one, 5 months from now, will probably have big changes, too.
  • Notmyusualid - Thursday, July 3, 2014 - link

    Will keep an eye out for it, but I'm expecting this to be no big deal now.
  • ergo98 - Wednesday, July 2, 2014 - link

    Too much has been made regarding AOT and JIT. Note that Dalvik generally only JITs the DEX once, storing the result in /data/dalvik-cache.

    The big difference between Dalvik and ART is simply that ART was rewritten from the ground up based upon everything they learned from the Dalvik experience.
  • errorr - Thursday, July 3, 2014 - link

    That and because of the Oracle lawsuit over Dalvik which is nicely mooted by ART.
  • doubledeej - Wednesday, July 2, 2014 - link

    It never ceases to amaze me how many problems that were solved decades ago in computing are problems on modern computing platforms.

    Real compilation of code has been around forever -- the norm, in fact, for desktop and server computing with a few notable exceptions. Yet somehow taking what effectively amounts to interpreting code (just-in-time compilation is very similar to interpretation) and switching to compiling it ahead of execution is being touted as a new idea.

    The fact that Android has pretty much been completely reliant upon JIT running in a VM has always made me scratch my head. As clearly spelled out in the article, it cause huge performance issues, along with significant hits to battery life. And we're talking about mobile devices where we've got relatively low-power CPUs and GPUs, little memory, and finite battery capacity. But it has been the way that Android has worked from the beginning. Crazy that it hasn't really been addressed until now.

    And the idea that operating systems and development languages be in charge of garbage collection, and people being surprised that it causes performance hits, seems odd to me too. Managing your own memory isn't that hard to do. And it is a hell lot more efficient doing it yourself than making the language or OS figure out how to do it. It's a "clean up your own mess and put things back where you want them" vs. "make someone else do it and let them try to figure out where things go" situation. It might make development easier for entry-level developers, but it certainly isn't an efficient way to do things when performance and user experience are important.

    Because the developers that I work with aren't accustomed to managing memory, we're constantly running into issues. We've got scripts that allocate dozens or hundreds of megabytes of RAM and don't free it when they're done. They'll go through 3, 4, or 5 more of these processes within a single script, not freeing memory they're done with along the way, so by the time the script is done running hundreds of megabytes that aren't needed are still tied up. Because the language can't be sure if data is going to be used again it hangs around until the script has finished running.

    Create dozens or hundreds of instances of one of those scripts and you've got a performance nightmare. Relying on a language or OS to do garbage collection will have the same net result.
  • metayoshi - Wednesday, July 2, 2014 - link

    Nothing you say is wrong, but I think you hit the nail on the head with this sentence when it comes to Android: "It might make development easier for entry-level developers, but it certainly isn't an efficient way to do things when performance and user experience are important."

    I personally think Android didn't care that performance was so bad in the early days. The point of Android, from what I can tell, was to make things Open Source and make it easy for developers. As you said, having the OS manage memory itself, it's meant to make programming easy. I think that's what made it attractive to the likes of Motorola, Samsung, and HTC in the beginning. I think that's what made it popular with the OEMs, and eventually, that's what users were getting used to.

    Yes, precompiled code in interpreters are nothing new. But ART is changing what Android can do. It's not a new concept, I agree with you. But again, Android has had different priorities from the beginning than, say, writing purely in C and/or assembly for mission critical or safety critical systems where real time better be real time or else that car/plane/space shuttle will crash, or even in other not as critical embedded systems like HDDs and SSDs where performance and power matters more than anything. I think Android has always been about the easiness in its development environment, just like Java, and that's just where they put their priorities first. Now that their development environment has been pretty well founded, I think they're making the right steps with improving performance, first with the JIT compiler in 2.2, "Project Butter" in Jelly Bean, and now making the default environment ART instead of Dalvik in Android "L". They just had different priorities, and well... look at where Android is now.
  • Hyper72 - Friday, July 4, 2014 - link

    I think you're completely right about ease of development being the priority for Android early on, after all they had to establish a market and needed apps quickly and easily. After Google bought the OS it suddenly got lots of developer attention and they just ran with the setup as it was. If Google had made lots of changes at that time they might as well have rolled their own.
  • errorr - Thursday, July 3, 2014 - link

    The answer is in the article, it was about memory management really and once it was baked in all the development was to improve what already existsed.

    After Oracle sued them (pending) over Dalvik and creating their own VM it became abundantly clear that they needed to tear down the whole thing and start over.
  • tacitust - Thursday, July 3, 2014 - link

    Google adopted Java for Android because it was a mature programming language, popular with developers, that they didn't have to create from scratch and had features (i.e. running in a VM) that made it easy to create secure apps that would run on a multitude of different hardware platforms. Java also had an affordable (i.e. free) development environment (Eclipse) that Google could build their development tools around.

    Clearly, with the incredible growth Android has enjoyed over the last six years, the decision to go with Java was anything but a mistake.

    As for compiler technology, the necessity to run the same apps on multiple hardware architectures precluded the use of traditional desktop and server based compilers, and the technology behind JIT compilers certainly hasn't been standing still over the last decade. The performance and battery deficits caused by the current VM environment are certainly not as bad as you think they are, given that modern Android tablets come pretty close to matching IOS which only has one hardware platform and architecture to worry about and where the software can be tightly integrated with that sole platform. It's not as good, no, but it's good enough for Samsung to sell millions of phones in direct competition with the iPhone.

    Yes, the time has come for Google to move on, but there should be nothing amazing about their use of a Java-based platform that has served them very well over the past six years. It was the right decision at the time.
  • grahaman27 - Saturday, July 5, 2014 - link

    Well said.

Log in

Don't have an account? Sign up now