Friday, September 23, 2016

Using Java Thread Dumps to Diagnose Application Performance

On a holiday weekend last year, I got an emergency call from a client. One of their Java EE applications would freeze and stop servicing users within an hour after container start-up. I was called in to help investigate. I started off by requesting a thread dump and memory dump of the container once it had stopped accepting requests. It's the thread dump that I'm focusing on today.  That Java thread-dump is here (package scrubbed to protect the client).

During that exercise, I noticed that when I analyze thread dumps, I look for the same things. Whether it's a performance issue or some sort of freezing issue, I manually scanned the thread dump for the same types of conditions. This year, working for another client, I'm faced with a performance tuning exercise that will likely require analysis of numerous thread dumps and wasn't looking forward to the busy work. That prospect got me to do some introspection and figure out exactly what I look for and find or build a product that does this.

Threads that block other threads

Most developers know the syntax behind using Java's synchronized keyword and that it's used to ensure that one and only one thread executes a section of code or uses a given Java resource at one time. I'm not going to digress into a discussion of lock monitors and coding issues; if you need a refresher, please see this article. Most experienced developers use synchronization with extreme care as bugs with synchronization are intermittent and extremely hard to diagnose and fix.

Frequent symptoms of synchronization issues are performance problems or cases where applications freeze and no longer accept client requests. Essentially, synchronization causes other threads servicing other client requests to wait until the needed Java resource is available for use. Those waiting client threads are typically in a BLOCKED state. I immediately suspected this type of issue in diagnosing the issue I was investigating over the holiday last weekend.  Here's a sample of a BLOCKED thread entry in a thread dump:

"http-bio-" daemon prio=6 tid=0x000000001cf24000 nid=0x2054 waiting for monitor entry [0x0000000022f9c000]
   java.lang.Thread.State: BLOCKED (on object monitor)
 at java.beans.Introspector.getBeanInfo(
 - waiting to lock <0x0000000680440048> (a com.sun.beans.WeakCache)
 at org.apache.axis.utils.BeanUtils$
 at Method)

Note that the dump explicitly lists that it's waiting on resource 0x0000000680440048 which is owned by this thread:

"http-bio-" daemon prio=6 tid=0x000000001bf06000 nid=0x21b0 runnable [0x000000002e0dd000]
   java.lang.Thread.State: RUNNABLE
 at java.lang.Class.getDeclaredMethods0(Native Method)
 at java.lang.Class.privateGetDeclaredMethods(
 at java.lang.Class.privateGetPublicMethods(
 at java.lang.Class.getMethods(
 at java.beans.Introspector.getPublicDeclaredMethods(
 - locked <0x0000000680440048> (a com.sun.beans.WeakCache)
 at java.beans.Introspector.internalFindMethod(

It turns out that more than one thread was waiting on this resource.

IO bound threads

One frequent source of application performance issues are threads that are waiting on Input/Output to occur. This often takes the form of a database read or write or a service call of some type.  Many developers assume that most performance issues are caused by slow database access and start looking at SQL queries. I do not make this assumption. Furthermore, if it is a database tuning issue, you need to identify the specific SQL that needs to be tuned. At any rate, if the source of your performance issue is IO, thread dumps can help you identify where in your code the issue is taking place. 

Here is an example thread that's IO-bound:

"QuartzScheduler_Worker-2" prio=6 tid=0x000000001abc7000 nid=0x2208 runnable [0x000000001df3e000]
   java.lang.Thread.State: RUNNABLE
 at Method)
 at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(
 at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(
... (many thread stack entries omitted for brevity)
        at com.jmu.scholar.dao.FormRuleDAO.findByFormId(

Note that within the thread stack, there's an explicit reference to application code that's initiating the IO. In this case, IO is being initiate by a database query in a specific application method. 

Note that just because the dump caught this one occurrence of this query, doesn't mean it's a performance issue. If, however, a large percentage of running threads are IO-bound in the same method, then this database access would become a tuning target. To tune this specific database access, a developer can focus on this one query instead of looking at all queries within the application. How to tun the database access is out of scope for this blog entry.

Performance Hot Spots

Most developers upon being asked how to tune an application in an interview will tell you to use a Java profiler. That answer misses the point. A profiler helps you tune a specific section of code after you've identified the section of code that needs to be tuned. Often performance issues show up in production and it's not possible to run a profiler on your container in production. 

A thread dump taken in production on an active application can help you identify which section of code needs to be tuned, perhaps with a profiler. Furthermore, thread dumps are unintrusive enough that you can take them in production without material impact to users.  

To see how dumps help, let's review how a profiler works. A profiler works by taking a thread dump periodically, perhaps every 5 milliseconds. That thread dump specifies where in your code you're spending time. For example, if your test causes the profiler to take 100 samples and method appears in 33 of them, then your spending 33% of your time in that method. If that's the method with the highest percentage, that is where you'll often start tuning.

In fact, thread dump data is better than profiler data in several ways:
  • It measures what's actually happening in production vs. a profile of a specific business process or unit test case.
  • It includes any synchronization issues between threads that won't show up in a profile of one thread in a unit test case (there are no other threads to contend with).

The problem is collecting the data. Yes, thread dumps are easy to collect. Counting occurrences of method references in running threads is laborious, tedious, and annoyingly time consuming.

Introducing StackWise

The first thing I did was scan for products that analyze thread dumps for these types of issues. There are many products that analyze thread dumps, but they tend to be interactive tools that summarize threads and allow you to selectively expand and contract them. A couple categorized threads by state (e.g. RUNNING, BLOCKED, etc.). However, none of these really looked for the three items I look for. Hence, I created the product StackWise, which is open source and freely available for you to use.

StackWise will analyze a thread dump and report useful information on all three of these conditions.  A sample of StackWise output can be found here.  Note that you get the following items of information:
  • The percentage of threads that are IO bound and a summary of those IO-bound threads.
  • Threads that are locking resources needed by other threads.
  • A list of application method reference counts listed in descending order. 
In interpreting performance hot spots, StackWise will report application methods in which you're spending the most time. Methods belonging to ServletFilter classes can be ignored as they are often listed in all running threads.  Other method mentions, however, are possible tuning targets.

If you analyze thread dumps in ways other than what StackWise already covers, I'd like to hear your ideas.  Thanks for reading this entry.

Sunday, September 4, 2016

All Monoliths Are Not Created Equal

All monoliths are not created equal. That is, not all monoliths became monoliths the same way nor do they have the same root causes. With this post, I'll identify different types of monoliths I've seen. This is useful as the tactics I use to break the monolith into smaller, more manageable, pieces is different depending on which category of monolith I'm dealing with. It's worth noting that some times elements of these different categories of monoliths are used in combination.

A monolith is an application that has grown too large to effectively manage. That is, it's expensive to enhance and fix. Change to monoliths often come with unintended consequences; you fix something and other things break. Monoliths effectively marry you to a technical stack, making it difficult to grow with new technologies as they evolve and mature. Due to the size and complexity of underlying monolithic code, it's often expensive and time-consuming to bring on new developers (there's so much to learn). The same labor characteristic limits the business in that outsourcing change often isn't an option.  

Feature-Bloated Web Application
This type of monolith is specific to web applications. It becomes larger and more complex as new features are added. Moreover, some of those new features weren't considered when the application was originally designed. Consequently, those new features are often inelegantly "tacked on" and don't really conform to the initial design of the application. Often, there's no time and budget to revamp the application design to add new features in ways that are easy to maintain and support. The consequence for this is often technical debt in the form of unwanted coupling.

New features add complexity to all parts of the application. New features impact not only the user interface but server-side backing code that supports that interface and the underlying database. Often that server-side backing code and database is tightly coupled with the user interface and "assumes" that the business processes currently implemented by the interface. Often, it is assumed that these business processes are static and rarely change. Unfortunately, that is often not the case. Developers react to the additional complexity in predictable ways.

Beware of undocumented developer assumptions. These assumptions are often made for developer convenience. Those assumptions make the new feature easier to code. I don't blame them; the extremely large feature set is often too difficult to keep in one person's head. Developers, as their assumptions produce initially more streamlined and easy to read code, believe that they are making a positive contribution.  Unfortunately, they are also inadvertently creating land-mines for other developers who aren't aware of those underlying assumptions and need to change that code for some other feature addition. As most of these assumptions go undocumented or even unidentified, other developers working on that section of code are often unaware of all the consequences of their changes.

Most monoliths don't measure feature usage. As a consequence, features, once implemented, never get removed. It would be logical to remove features that aren't used or aren't used often as the complexity for having that code remain will negatively impact your ability to add new features down the road. Sadly, getting budget for removing features is hard to do in most organizations as the benefit to doing so is hard to measure and too intangible.

 It's worth noting that often the design of the underlying database often becomes bloated along with the web application. Due to the large scope of business it supports, the underlying database often becomes an Achilles heel for the organization. Not only is database refactoring much more costly and difficult than refactoring code, it's not uncommon for that database to be used by multiple applications. The implication here is that change to the underlying database often has consequences for more than the monolithic application it was originally created for.

Data Store Strangle
Despite the length of time that databases (relational and non-relational) have existed, database design skills are woefully inadequate at most organizations. As a result, you often get a process-oriented database structure. That is, a database structure that is specific to processes currently implemented by the application that database supports. Usually, the mindset is that the database can be 'refactored' along with code as new features are implemented.

Database design impacts application code. At the risk of stating the obvious, applications have code dedicated to reading and writing to the database. Hopefully, database code is restricted to a data access layer within the application, but not all developers adopt the practice of separating data access concerns. However, the readability of that data access code is directly impacted by the quality of the design of the underlying database. If the database structure is complex to the point that its not understood, chances are the data access code that uses that database will be just as hard to understand. Furthermore, if the application uses a different domain mode than was implemented in the database, some type of data structure translation is present in data access code. The short story is that the database design used can greatly increase the size and complexity of data access code.

Refactoring databases is harder and more time-consuming than refactoring code. There are a couple of false assumptions buried in that line of thinking. First, it is assumed that refactoring a database is as easy as refactoring code. Not true. Any database refactoring often must take place while supporting existing code. Also, any database refactoring also must have some type of data conversion as a part of it.  Another way to look at it is that code only has state that's transient. That is, its state exists only at run time. Database state is persistent. Changes to that database must include converting the data that was stored into its new format.

Refactoring databases has more risk than refactoring code. If a code refactoring is discovered to have a defect, backing out that change is usually an option. Database refactoring changes usually don't have this safety net. Yes, you can restore a version of the database that was taken before the refactoring was implemented, but users will loose any data entered while that refactoring was in place. Unless you code and test a reverse conversion, implementing database refactoring changes commits you (pun definitely intended) past the point of no return. 

The Over-Engineered Tarball Component

Some developers can't resist over-engineering internal components they write. Often that happens because they are bored and looking for something more interesting to do. I can't fault them for that motivation. Unfortunately, when those mental musings get implemented in applications, additional complexity often results. This contributes to an application becoming a monolith.

I've seen a case where such a component was written to validate cash transfers from account to account. While laymen to this business process might look at such a transfer action as simple, those of you deeply embedded in bank know differently. The tax ramifications and considerations are incredible depending on how that transfer is recorded. In fact, we had over 100,000 permutations and combinations of transfers that needed to be verified and tested.  Anyway, the component in question utilized the same type of bit-switching algorithm that UNIX uses for file permissions, but with many more dimensions than just user, group, and all permissions.  Anyway, when that person left, nobody could understand the component nor could they effectively change the rules it applied.  Long story short: the application was held hostage by this embedded, but critical component.

Overly complex components have a way of creating complexity in the surrounding parts of the application. In my example, we had code to correct the validation verdict in places where it came up with the incorrect answer. This was needed as nobody knew how to change the overly-complex component handling validation.

The business process implemented by the component is often not understood. That is, replacing such a component is often more difficult as the business rules it implements are often not understood. It's hard to code a replacement without a specific target.

The Distributed Monolith
Separate services are supposed to increase developer velocity by reducing the size and complexity of code that needs to be changed. We all know that this doesn't always work out. We've all seen situations where feature additions require coordinated changes across multiple components. We've all seen that even though these services are theoretically decoupled and should be able to be changed/enhanced/deployed independently, that this doesn't seem to happen much of the time.  The reason is that these services are not truly decoupled. Let me explain.

Developers have a desire to consolidate code. We've incented this in developers for years by promoting DRY (Don't Repeat Yourself). Carrying this idea forward, developers consolidate repeated code whenever they find it. Invariably, business code gets consolidated into common libraries. It starts with implementing a service contract in code, making a library out of that, and sharing it across the publishing service and all consumers. It often progresses from there to include common business code that is listed as a dependency for multiple services. this common business code, which usually is used for manipulating common business data, tends to also be consolidated. Perhaps by publishingthat logic in common libraries, and is used by several services.

Developers are often quite proud of the code consolidation. Unfortunately, whenever that common code changes, it forces a coordinated redeployment of all consumers along with the producer. In this world, there's no such thing as a non-breaking change. In essence, DRY between services increases coupling between services.

Distributed monoliths are caused by coupling between services that forces change across multiple services to be coordinated. In these cases, you're not getting the benefit of breaking the world up into smaller services. In fact, just the opposite. You get all the disadvantages of a highly coupled monolith along with the additional overhead of managing a larger number of services. In other words, you forgo all advantages of implementing microservices and pay more for the privilege.

Separate common business functionality into its own service instead. Deploy it once. call it from whatever services need it. Only maintain it once.

Services should always be able to choose when they upgrade included libraries.  Always - no exceptions. To get the benefits of breaking the monolith up into several services, you cannot permit any common code that could force a deployment of a service.

In forthcoming blog entries, I'll suggest tactics for breaking each of these types of monoliths into more manageable applications. Thanks for reading.

Saturday, August 27, 2016

A Pleasurable Journey into Text Translation using ANTLR4

For one of my clients, I needed to spec out a REST service API for developers. Although they had standardized on the API Blueprint product, I feel compelled to look for alternatives for specking out future REST APIs.  The input needed for API Blueprint is just too verbose and requires far too much time. If I expected to spend less time specking out REST APIs, this might not be an issue. Or at least not as much of an issue. I also checked out Swagger and RAML and found the same verbosity issue; I just don't have time for that on an ongoing basis.  There's a good review of these products that I found particularly useful here.

My thoughts turned to a syntax for specking REST APIs that would be far more streamlined and concise. Within an hour, I had a draft of a syntax (example below) that would be far less verbose and that I could increase my productivity significantly per API.  The problem would be writing something that could interpret specifications in this format and could then generate the verbose XML , Markdown, or YAML syntax for one of the other API designer products mentioned above. It turns out that there are products that specialize in text translation.  ANTLR is the most popular of these products right now.

The way ANTLR works is that you specify a grammar that describes the text you want to translate. ANTLR uses that grammar to generate Java code that can take text in that format and interpret it for you in the terms that you specified in the grammar. For instance, I define a REST Resource block in my grammar and told it about the syntax for different operations and the json arguments they accept as input or emit as output. ANTLR generated code will analyze the specifications I write and convert that to a Java object format that I can more easily read, interpret through code, and generate useful translated output with the help of a templating technology, such as Freemarker. An example grammar for the Java language as an example can be found here.

It turns out that specifying a grammar wasn't as easy as I thought it would be going in. What, in my mind, is a simple context is actually quite complex when you break it down into constructs that a product like ANTLR can understand. Essentially, you need to specify all whitespace (characters to ignore) and comments if the syntax is to support them. All special keywords and rules that govern when those keywords are expected to be used also need to be specified.  At this point, I should have just backed down from this idea and suffered through one of the more verbose solutions. However, by this point, I'm far too interested in how structured text gets specified and what's possible by interpreting it through code to stop.

I'm part way through this project and will open source it once complete.  For those taking similar text translating journeys with ANTLR, I have ferreted out some techniques that helped me immensely.

Write and test the Lexer portion of the grammar first.

ANTLR breaks up grammars into two pieces: a "Lexer" and a "Parser". A "lexer"  understands what characters and keywords are important for what you're doing and skipping any unneeded whitespace. It also formats those characters/keywords internally as "Tokens" so that it can be used for more sophisticated translation later on. A "Parser" applies rules to important characters and keywords to interpret context. For example, a REST resource definition doesn't make sense in the data type structure section of my proposed REST API specification syntax.

As the parser uses lexer output; it's important to make sure the lexer portion of your grammar tests out first. Any testing of the parser at this point is premature. Assertions in your lexer test should be:
  • Make sure all characters and keywords are recognized.
  • Make sure that the lexer identifies characters and keywords correctly. For instance, I had a bug early on where the keyword 'Resource' was recognized as a string literal. In my syntax, 'Resource' has a special context and meaning.

You can test the lexer generated from your grammer by iterating through the Tokens generated. Any unrecognized tokens shouild cause a test failure. If the lexer doesn't recognize your special characters and keywords (e.g. doesn't identify the correct number of keyword 'Resource' from your test sample), then it should also cause a test failure. 

Write Parser rules iteratively from general rules to more specific rules.

Parser rules apply context to the tokens identified by the Lexer. I found it much easier to start with very general parser rules and get those working. For example, my syntax has two main sections: a bounded context section that describes resources and operations and a Types section that describes all data types used by the API. My first iteration of the parser rules just identified the two sections.  That isn't enough to do what I need, but I didn't leave it there. Over time, I specified the portions of both sections and progressively describe them in more detail.

In other words, parser rules describe a section of your input text. The first test for parser rules can be simple; just test the start line/column position and end line/column position for each parser rule. If those are correct, then you can describe more specific rules that carves up the larger sections in the first iteration. Each parser rule you write has a value object specifically generated for it. That value object has the starting and ending token for the section it covers (you can get the starting and ending positions from those tokens).

There are a few points that aren't obvious about the ANTLR product to remember.
  • Lexer rules have UPPER_CASE names. Parser rules are lower case
  • At least one parser rule should apply to the entire document (minus skipped whitespace).
I'll post additional reports and publish the resulting work via Github when complete.  I'm still midway through this effort.

An Example REST API Specification Syntax

#student.spec - Student REST API
Bounded Context: Student Information {
Resource: Student // Everything about current and past students
Operation: /student - POST, json, student //Creates a student
httpStatus: 201,400
return: json, studentId
Operation: /student/{studentId} - PATCH, json, student //Update student (only those attributes provided)
httpStatus: 200,400,404,405
Operation: /student/{studentId} - DELETE //Delete student
httpStatus: 200,400,404,405
Operation: /student - GET //Finds a list of students by status
status - string[] // Status values to search by
httpStatus: 200,400
return: json, student
Operation: /student/{studentId} - GET //Finds a student by their id
httpStatus: 200,400,404
return: json, student
Types: {
student {
studentId - required, string // student Identifer that uniquely identifes a student
firstName - required, string $$Bill
middleName - string
lastName - required, string $$Williamson
title - enum{Mr, Ms, Mrs}
birthDate - required, date
primaryAddress - required, address
schoolAddress - address
primaryPhone - required, phone
cellPhone - phone
status - enum{Applied, Accepted, Active, NonActive}
address {
streetAddress1 - string $$123 Testing Lane
streetAddress2 - string
City - string $$Somewhere
StateCode - string(2) $$IL
zipCode - int(5)
zipCodeExt - int(4)
phone {
countryCode - int(2)
areaCode - int
prefix - int
line - int
extension - int
classSection {
title - string
discipline - string
courseNbr - int
building - string
room - string
time - string

Wednesday, July 20, 2016

A Cute Trick with Generics to Avoid Unwanted Casts

I ran into a cute trick while coding a unit test on a Spring application the other day that I'd like to share.

Consider these lines of Java code.
The first line requires a cast.  the reason is that the method returns an Object.  While the cast isn't the most labor intensive construct, but it makes code less clear and is inconvenient to developers.

Note that the second line does *not* require the cast making it much more convenient for developers. the secret is that Spring's ReflectionTestUtils uses generics. take a peak at how invokeMethod is defined.

In effect, Java infers the type of the value returned by the definition of the variable it is assigned to. In this case, it's a String.

This is convenient for developers in that there's less to type, but also more easily read.

You should not that the developer is required to know what type of value is returned. The following statement will generate a ClassCastException.

This might seem like a disadvantage to using generics in this way. I don't think so. In either case, the developer needs to understand the data type that will actually be returned.

Just thought I'd pass this tid bit along. I'll certainly think more about using generics in this way in APIs I write.