Wednesday, January 26, 2011

java memory leaks

One of my colleagues asked how I diagnosed a memory leak that was impacting his application , so this is what i said. Maybe Ill write it tutorial form some time


The version of java we are running doesnt make this easy so this is what I did on dev (its hard to make this a coherent narrative).

Almost any memory leak debugging takes the following form(Im assuming the Out of Memory says the HEAP is out of memory)
a. Run your application steps that are causing the memory leak (detecting this is actually the biggest problem but if you dont know what step is causing the leak , you basically have to access all the functionality of your application). This ensures that all statics/caches/classes are loaded. Then force a garbage collection. Then check the memory for the Java VM and take a heap dump (this is a binary form of all the objects on the heap including the allocation graph)
b. Now repeat the step causing the problem. make sure that the step ends correctly (for e.g. if the step is a user goes and saves the quote, ensure that you logout and invalidate his session otherwise you get false readings). Now run a full GC again. Take another heap dump.
c. Compare the two heap dumps. if there is a difference that means you have a leak - ideally there shouldnt be a difference. In practice on a server like tomcat there are a lot of things happening in the background even if no user is accessing the site so even after a full gc , you might still see some tomcat classes or objects that appear to be new. The way around this is that you have to loop the step in b so that you will get to see significant differences for your classes rather than the app server classes.

For our env however taking a heap dump seems problematic because of the version of JDK seems really old(1.5 _07) and I wasnt able to load the taken heap dump in any tool.

Heap Dump
To take a heap dump you have the following options
a. Some JVM's support -XX:+HeapDumpOnCtrlBreak (kill -3 pid on linux). However our version doesnt
b. Some profilers(e.g. Jprobe) allow you to take heap dump - You have to pay for this but there are some free ones like NetBeans. however all of them need you to run some install steps so I didnt use this
c. use jmap . This is a utility that Java provides. Later versions let you say JAVA_BIN/jmap -dump processid. For our version we have to say jmap -heap:format=b processid . however this isnt working correctly since i couldnt load this dump into any tool. what I finally used was jmap -histo processid. This is a poor mans heap dump . it shows the classes and the number and the size they occupy, but it doesn't show who allocated the classes (so for e.g. I knew it was the concurrency related classes that were leaking , but I didnt know who was creating them - that was thanks to Google)

d. +XX:HeapDumpOnOutOfMemory - This will automatically take a heap dump when you run out of memory but debugging with this is slightly harder because you have live objects
e. use HPROF (this is an earlier version of the tool) when you start up java but it will slow the process.

Monitoring
To look at how much memory is being used, and to force GC you have the following options
a. Look at the probe we installed (specific to tomcat) - lambda/psi probe . Click System information . under overview you can see the memory being used and you can Force GC as well. You can also click Memory Utilization and it will show you the breakup. I had to add some JMX parameters to startup. This is tomcat specific.
It also lets you see sessions and their footprints (sometimes the leak isnt actually a leak. Code puts too much data in session , the user doesnt log out , the session remains active for say 30 minutes , then that much memory is used up for 30 minutes. technically this isnt aleak , when the session times out , the objects will get collected , but it looks like a leak because the memory keeps getting used , but doesnt get reclaimed)
b. use JConsole . This is in the bin directory of your java. When you start it on your machine you can add a remote connection
These are the settings I added on dev
export JAVA_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/usr/local/apache-tomcat-5.5.30/logs -XX:PermSize=256m -XX:MaxPermSize=256m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=12345 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dfile.encoding=UTF8"
This will also let you see some things that the probe does.
Using this I could find out
a. That it wasnt the sessions causing the problem (since the probe showed that only 1 MB was being used in sessions).
b. I also used it to check and see whether threads were causing issues , not the case.

Testing
I used JMeter to simulate a web user going and approving a quote and simply looped that 4000 times. I couldnt not use say SOAPUI because we needed to simulate a stub being created whereas SOAP acts as the client so it wouldnt have shown the problem. Ordinarily you would not know the Stub causes the problem , but the histograph had shown such a large number of the concurrent objects on the heap , that a google search showed that other people had faced this problem with axis 1.4.1. If I didnt have this I would have had to write a script that would have placed a quote, gone and approved it , saved it , modified it etc.

The histographs are in /root/histbeftest and /root/hist92per and are easily compared to verify.

Analysis
The histograph files are plain text and can be compared in a text editor. If I had a heap dump we could use
a. JHat - this comes in the bin directory of java (1.6 onwards and some update of 1.5 onwards but it can read 1.5 dumps). This will let you browse the heap and even compare them
b. Profilers like JProbe
c. Other tools like VisualVM, HPJmeter NetBeans, extensions in eclipse all let you load heap dumps.
One of the issues you might run into is that the Heap Dump file is quite large and your client tool may not have enough memory. To verify a memory leak you usually reduce the amount of memory you give to the JVM so that you get smaller heap dump files (so for e.g. the iquote app once it is running and all the classes are loaded takes about 20 to 30% of 512MB which is about 150 MB. So I could reduce the memory to 256MB instead of 512 and run the tests to create a much smaller heap dump)

a memory leak in sample code and then trying to detect it using tools is usually the best way to learn.