Friday, September 23, 2016

Using Java Thread Dumps to Diagnose Application Performance

On a holiday weekend last year, I got an emergency call from a client. One of their Java EE applications would freeze and stop servicing users within an hour after container start-up. I was called in to help investigate. I started off by requesting a thread dump and memory dump of the container once it had stopped accepting requests. It's the thread dump that I'm focusing on today.  That Java thread-dump is here (package scrubbed to protect the client).

During that exercise, I noticed that when I analyze thread dumps, I look for the same things. Whether it's a performance issue or some sort of freezing issue, I manually scanned the thread dump for the same types of conditions. This year, working for another client, I'm faced with a performance tuning exercise that will likely require analysis of numerous thread dumps and wasn't looking forward to the busy work. That prospect got me to do some introspection and figure out exactly what I look for and find or build a product that does this.

Threads that block other threads

Most developers know the syntax behind using Java's synchronized keyword and that it's used to ensure that one and only one thread executes a section of code or uses a given Java resource at one time. I'm not going to digress into a discussion of lock monitors and coding issues; if you need a refresher, please see this article. Most experienced developers use synchronization with extreme care as bugs with synchronization are intermittent and extremely hard to diagnose and fix.

Frequent symptoms of synchronization issues are performance problems or cases where applications freeze and no longer accept client requests. Essentially, synchronization causes other threads servicing other client requests to wait until the needed Java resource is available for use. Those waiting client threads are typically in a BLOCKED state. I immediately suspected this type of issue in diagnosing the issue I was investigating over the holiday last weekend.  Here's a sample of a BLOCKED thread entry in a thread dump:

"http-bio-" daemon prio=6 tid=0x000000001cf24000 nid=0x2054 waiting for monitor entry [0x0000000022f9c000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at java.beans.Introspector.getBeanInfo(
 - waiting to lock <0x0000000680440048> (a com.sun.beans.WeakCache)
 at org.apache.axis.utils.BeanUtils$
 at Method)

Note that the dump explicitly lists that it's waiting on resource 0x0000000680440048 which is owned by this thread:

"http-bio-" daemon prio=6 tid=0x000000001bf06000 nid=0x21b0 runnable [0x000000002e0dd000]
   java.lang.Thread.State: RUNNABLE
 at java.lang.Class.getDeclaredMethods0(Native Method)
 at java.lang.Class.privateGetDeclaredMethods(
 at java.lang.Class.privateGetPublicMethods(
 at java.lang.Class.getMethods(
 at java.beans.Introspector.getPublicDeclaredMethods(
 - locked <0x0000000680440048> (a com.sun.beans.WeakCache)
 at java.beans.Introspector.internalFindMethod(

It turns out that more than one thread was waiting on this resource.

IO bound threads

One frequent source of application performance issues are threads that are waiting on Input/Output to occur. This often takes the form of a database read or write or a service call of some type.  Many developers assume that most performance issues are caused by slow database access and start looking at SQL queries. I do not make this assumption. Furthermore, if it is a database tuning issue, you need to identify the specific SQL that needs to be tuned. At any rate, if the source of your performance issue is IO, thread dumps can help you identify where in your code the issue is taking place. 

Here is an example thread that's IO-bound:

"QuartzScheduler_Worker-2" prio=6 tid=0x000000001abc7000 nid=0x2208 runnable [0x000000001df3e000]
   java.lang.Thread.State: RUNNABLE
 at Method)
 at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(
 at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(
... (many thread stack entries omitted for brevity)
        at com.jmu.scholar.dao.FormRuleDAO.findByFormId(

Note that within the thread stack, there's an explicit reference to application code that's initiating the IO. In this case, IO is being initiate by a database query in a specific application method. 

Note that just because the dump caught this one occurrence of this query, doesn't mean it's a performance issue. If, however, a large percentage of running threads are IO-bound in the same method, then this database access would become a tuning target. To tune this specific database access, a developer can focus on this one query instead of looking at all queries within the application. How to tun the database access is out of scope for this blog entry.

Performance Hot Spots

Most developers upon being asked how to tune an application in an interview will tell you to use a Java profiler. That answer misses the point. A profiler helps you tune a specific section of code after you've identified the section of code that needs to be tuned. Often performance issues show up in production and it's not possible to run a profiler on your container in production. 

A thread dump taken in production on an active application can help you identify which section of code needs to be tuned, perhaps with a profiler. Furthermore, thread dumps are unintrusive enough that you can take them in production without material impact to users.  

To see how dumps help, let's review how a profiler works. A profiler works by taking a thread dump periodically, perhaps every 5 milliseconds. That thread dump specifies where in your code you're spending time. For example, if your test causes the profiler to take 100 samples and method appears in 33 of them, then your spending 33% of your time in that method. If that's the method with the highest percentage, that is where you'll often start tuning.

In fact, thread dump data is better than profiler data in several ways:
  • It measures what's actually happening in production vs. a profile of a specific business process or unit test case.
  • It includes any synchronization issues between threads that won't show up in a profile of one thread in a unit test case (there are no other threads to contend with).

The problem is collecting the data. Yes, thread dumps are easy to collect. Counting occurrences of method references in running threads is laborious, tedious, and annoyingly time consuming.

Introducing StackWise

The first thing I did was scan for products that analyze thread dumps for these types of issues. There are many products that analyze thread dumps, but they tend to be interactive tools that summarize threads and allow you to selectively expand and contract them. A couple categorized threads by state (e.g. RUNNING, BLOCKED, etc.). However, none of these really looked for the three items I look for. Hence, I created the product StackWise, which is open source and freely available for you to use.

StackWise will analyze a thread dump and report useful information on all three of these conditions.  A sample of StackWise output can be found here.  Note that you get the following items of information:
  • The percentage of threads that are IO bound and a summary of those IO-bound threads.
  • Threads that are locking resources needed by other threads.
  • A list of application method reference counts listed in descending order. 
In interpreting performance hot spots, StackWise will report application methods in which you're spending the most time. Methods belonging to ServletFilter classes can be ignored as they are often listed in all running threads.  Other method mentions, however, are possible tuning targets.

If you analyze thread dumps in ways other than what StackWise already covers, I'd like to hear your ideas.  Thanks for reading this entry.

No comments:

Post a Comment