Wednesday, November 5, 2014

Is it a bad practice to catch ‘Exception’?

I recently encountered an organization that religiously enforces the Checkstyle IllegalCatch rule (reference here) and mandates following it for all production code.  This rule seeks to ensure that developers don’t catch Throwable, Exception, or RuntimeException directly. The assertion is that it is always a better practice to catch more specific exceptions.  This rule seems a bit draconian to me, so I did a little surfing to see what others thought.  It seems the question has been debated at length, sometimes heatedly.  In fact, like discussion involving religion or politics, discussion of this issue is sometimes more heated than the strength of the supporting arguments provided.

Common reasons that people answer ‘yes’ to this question and assert that catching Throwable, Exception or RuntimeException is universally a bad practice seem to distill to the following reasons:

  • Developers can’t expect to handle all kinds of exceptions, which is what catching Exception implies (i.e. you’re code is not worthy).
  • It’s possible that classes up the stack will have a better ability to handle the unexpected exception thrown.
  • Detailed exceptions can often provide additional detail about the problem.
  • Developers routinely handle exceptions poorly loosing information needed to diagnose reported bugs or environmental issues.

It is interesting that a common “exception” (pun intended) to this rule given by the skeptical is that application entry points need to have a global catch.  Presumably the reasoning is that logging the exception to standard error (which is what the JVM does with uncaught exceptions by default) isn’t what’s wanted in most cases.

I am one of the ‘skeptics’ and disagree with the global catch prohibition, at least for Exception and RuntimeException, for several reasons.  Let's treat Throwable as a separate case that I'll address later in the entry.

There is no reliable way to identify the list of specific exceptions a developer can ‘expect’ and could be coded in catch clauses.  Most of us leverage third-party products in our code very frequently.  Unless the products you use throw checked exceptions or takes the trouble to document what exceptions it throws in the throws specification for all methods, there’s no way to get a specific list of the exceptions you can reasonably expect from product code.  If you don’t have a list of specific exceptions you can ‘expect’, there’s no reasonable way to code specific catches for those exceptions.

Specific catches violate DRY most of the time.  Most of the time, our response to all types of exceptions is identical.   Either we log the exception or convert it to some type of RuntimeException and re-throw it.  If our response to the most specific exceptions will be identical, coding multiple catches with identical logic just to adhere to the illegal catch rule just creates code bloat.  Yes, as of JDK 1.7, there’s a syntax to help eliminate the code bloat (as illustrated below).  Many of us aren’t on 1.7 in production yet.
JDK 1.7 catch example:
catch (IOException|SQLException ex) {
    logger.log(ex);
    throw new RuntimeException(ex);
}
Pre-JDK 1.7 catch example with DRY violation:
catch (IOException ex) {
    handleException(ex);
}
catch (SQLException ex) {
    handleException(ex);
}

The JDK itself is replete with instances of catching Exception.  In looking for best practice guidance for general coding questions such as this, I consult the JDK source itself.  Presumably the people who write the JDK are an authority on how the language was intended to be used.  If general catches were a bad practice, why are JDK authors doing it routinely? Don’t take my word for it.  Look for references to Exception in Eclipse and start auditing the search results.  Instances of global catches are too numerous to list here.  To save you time, MessageFormat (and many other java.text classes) and CompletedFuture (and many other java.util.concurrent classes) have global catch code.  Instances of catching RuntimeException, Error, and Throwable exist but appear to occur less often.

The JDK itself provides a way to supply handling logic for uncaught exceptions (as of JDK 1.5).  Classes implementing Thread.UncaughtExceptionHandler can be associated with any Thread or ThreadGroup.  Should a thread experience an uncaught exception, the JVM will automatically invoke the handler for the thread (or it’s thread group) if it was defined.  This feature can be used to supply logic to log exceptions (to somewhere other than the default standard error) or notify administrators in some other way.  Apart from the syntax difference, this really is a type of global catch.  Why provide this feature if it's a 'bad practice' to use it?  Unfortunately, exceptions logged at this level will have little knowledge of the context surrounding why the exception was generated.

Incidentally, it's not uncommon to farm global catches out to products that execute that global catch for us.  I'm thinking of Spring interceptors and handler adapters that are often configured to effectively issue global catches on our behalf and escort users to a standard error page.  Isn't this type of practice effectively a global catch, just with a different syntax? Isn't it hypocritical to configure products to issue global catches, but deliberately prohibit global catches directly?

Global catches combined with adding runtime context information has saved me boat-loads of time. In many cases, handling an exception either means logging that exception or recasting the exception in some way and including a meaningful context that might be useful to developers fixing the issue.  I routinely use ContextedRuntimeException from the Apache Commons Lang product for this purpose.  I’ve provided an illustration of this concept below (from the Apache Doc).

   try {
     ...
   } catch (Exception e) {
     throw new ContextedRuntimeException("Error posting account transaction", e)
          .addContextValue("Account Number", accountNumber)
          .addContextValue("Amount Posted", amountPosted)
          .addContextValue("Previous Balance", previousBalance)
   }

The stack trace output for the ContextedRuntimeException will report context information as well as the root cause.  For most reported exceptions using this technique, I’ve enough information to diagnose the bug or at least replicate the bug most of the time.  This technique *only* works when the exception and context are caught close to where the exception was generated. Also, it's important to record the original exception as a cause as it may have additional information useful to solving the issue.  Letting the JVM log the exception to standard error by default will not capture the information about the problem being captured here. Yes, it is possible that the additional information captured might not be valuable.  It is better to 'have' and not 'need' then the reverse.

The assertion that developers sometimes handle exception logic (code within a catch block) poorly and swallow information needed to diagnose problems is a fair point.  However, developers can insert poor exception handling logic for specific exceptions as well general exceptions.  Prohibiting global catches in all cases does not solve poor exception handling coding issues.  Best practices for exception handling logic is a topic that deserves more in depth treatment, perhaps in another blog entry.

There are reasons to catch specific exceptions.  Sometimes specific exceptions have additional information that should be captured.  Some exceptions represent business process errors or user errors and not unintended application exceptions. I'm not trying to advise that developers should catch Exception exclusively.  I do believe, it should be an option and not outright prohibited.

I would like to see two JDK improvements with exception handling.  Java needs a syntax to list “exceptions” for a global catch.  For instance, if I could easily code a catch that specified that I would like to catch all exceptions except instances of VirtualMachineError (out of memory conditions, JVM internal errors, etc.), the illegal catch rule would be easier to adhere to.  For example, I'd like to be able to specify a catch clause like:
catch all except(VirtualMachineError|LinkageError ex) {
    handleException(ex);
}

It would also help if it were possible to specify a default Thread.UncaughtExceptionHandler at JVM startup and supply notification logic for memory conditions and low-level virtual machine errors.  The default behavior of logging to standard error isn't wanted in many cases.  Note that it is possible to set the default exception handler for the entire JVM via Thread.setDefaultUncaughtExceptionHandler().

Wednesday, May 28, 2014

Handling External Processes in Java made Easy

Handling external processes in Java has never been easy.  The JDK provides a way to start and manage external processes and they do work; but they are awkward.  If the process hangs for some reason, getting at the input and output streams to figure out what’s going wrong is a pain.

I ran into this issue recently with my work on the Transform4J project, am open source Java ETL Transformation API,  where my test cases need to start and stop databases such as Cassandra and MongoDB.  Getting these test cases to work as far as starting up and shutting down these databases was very irritating and took more time than I would like to admit.  This prompted me to look for an open source product that makes process management easier.  I found one.

The Apache Commons Exec product makes external process management from Java very, very easy.  To illustrate with a simple example, let’s start up a Cassandra database for a series of unit tests.  

Executor executor = new DefaultExecutor();
executor.execute(new CommandLine(cassandraStartupCommand));

In my case, I was having trouble getting Cassandra to start up properly at one point, so I added a StreamHandler to capture any output so I could more easily debug the issue.  By default, this output goes to standard out and standard error (this is configurable).  I just added one line before execution:

executor.setStreamHandler(new PumpStreamHandler());

As it happens, I need to shut the database down after unit tests have completed.  It turns out that this is easy to do.  Cassandra initiates a graceful shutdown when the process is terminated.  To accomplish this with Commons Exec, you need an ExecuteWatchdog.  Adding one to the execution  is relatively simple:

ExecuteWatchdog  watchdog = new ExecuteWatchdog(5000);
executor.setWatchdog(watchdog);

When the unit tests are complete, terminating the process is simple:
watchdog.destroyProcess();

Note that there are additional features in this API that look useful (but that I didn’t happen to need for my unit tests).  With an ExecuteResultHandler, you can optionally throw an exception should the sub-process fail; I may incorporate this into my testing process.  You can very easily associate a ProcessDestroyer with the execution to terminate the executing process via shutdown hook when your JVM terminates.

I wish I had found this product sooner; it’s been around for five years or so.  Those looking for a more in depth introduction should check out the tutorial here.

Thursday, March 20, 2014

A JSON Library Evaluation for Java EE applications

I have a need to enhance a set of Java EE applications to support mobile development projects.   As mobile developers seem to prefer JSON formats to XML for passing data to/from mobile devices, I had a need evaluate current Java JSON libraries for use by the supporting Java EE applications.  The client prefers open source products to vended products.  The most prevalent open source product choices seem to be the following:

This evaluation was conducted in March, 2014.  The evaluation was conducted using the methodology described in chapter 13 of the Java EE Architect's Handbook.

Evaluation Criteria

As with all product evaluations, it’s important to establish the criteria by which product choices will be graded and on which a product decision will be made.  The criteria used in such a product choice would obviously vary per project and organization.  I’ve settled on the following criteria:

  • Level of community activity - measured by the average number of releases per year.  Active projects are more likely to be enhanced in future.  Should the product not be maintained and become obsolete, it may need to be replaced in future and may incur additional costs of ownership.
  • Market share – measured by the number of downloads as a proxy.  I would prefer direct market share information instead of a proxy, but that isn’t usually available for open source products. Solutions for problems and issues with the product are more likely to be posted on the web for more popular products; such increased posting speeds development and indirectly lowers cost of ownership.  
  • License – The license for the product must conform to a specified list of legal requirements; most common open source licenses are acceptable.
  • Ease of Use – subjective measurement based on code samples and the quality of documentation.   Ease of use features greatly speed development and indirectly lowers  the cost of ownership.  
  • Performance – measured by speed and required memory footprint.  Lower footprints lower hardware requirements and lowers cost of ownership.   Same test case used for all products so that performance can be more easily compared.
  • Versioning Support – I expect that as features are added, existing JSON formats will be changed and enhanced.  With a mobile development with the application distributed to the public, forcing upgrades to accommodate JSON format changes isn’t feasible.  A JSON library that provides support for legacy formats is preferred.  it’s possible that a versioning solution will involve additional products and not solely reside within the JSON library itself.
Not all criteria are necessarily weighted equally; weights given to each item will vary per project and enterprise.  It’s also possible that individual projects might have additional criteria not listed here.

Evaluation Results and Product Ranking

The evaluation criteria were prioritized by the needs of my upcoming project.  It's possible that your priorities might be different.  All criteria were assigned a numerical score from one to ten with ten being the best.

Scores for the criteria were weighted by priority.  High priority scores were multiplied by 2.  Medium priority scores were multiplied by 1.5.  Low priority scores were left alone.  Weighted scores for all criteria were then added together for the overall ranking.  Reasons for the ratings given are discussed in more detail in the product observations section below.


Product Observations


Ease of Use

I view this as the most important of the criteria as it most directly lower costs of ownership.  With both Gson and Jackson creating and reading JSON formatted data were two liners.  I’ve provided the source for my prototypes in the reference section below.  With my prototype, there was no need to write custom serialization/deserialization logic; both products do have robust options to do this should it be needed for your projects.  Both products by default handled escaping special characters like quotes and carriage returns.  The prototype source code and all dependencies can be downloaded from here.

One reason Gson slightly edges out Jackson is because it has better documentation.  The Gson documentation is better organized and more concise.  As a result of that documentation, constructing the Gson prototype took a little less time than writing the Jackson prototype.
It’s also notable that Jackson has a clumsy distribution: They need a one-zip download that contains binary jars, source jars, Javadoc jars, license and documentation.  Instead, all of these are downloaded separately and the product is structured so that it’s not obvious which jars you need for your particular project.

Note that coding using the classes provided with the JSR 353 spec takes much longer and it’s much easier to create bugs.  My prototype (I re-coded the same product with each of the products) required over 100 lines of code using the JSR 353 construct as opposed to the handful required by either Gson or Jackson.

I did not code a prototype for the Json.org product as it’s not a real product.  There’s no formal distribution bundle.  For distribution, you download class source individually and build it.  There are no unit tests, so you can’t validate your build very easily.  Essentially, product source becomes additional source in your project that you must maintain.  It became clear that this product isn’t a valid option, despite the fact it appeared near the top of internet searches for JSon libraries.

Level of Community Activity

Community activity is important for open source projects as products will become obsolete without care and feeding.  Being part of the Java EE specification means that the JSR353 specification have or will have support from all the Java EE implementers, both commercial and open source.  It’s hard to compete with that.

It should be noted that both Gson and Jackson typically have multiple releases per year.  Both products are used in several open source products.  Both Gson and Jackson have ample community support and will no doubt be enhanced for some time to come.

Market Share

The number of downloads could only available for one of the four products:  Gson had 143,904 downloads as of March 14, 2014.  Somebody built the binaries for the java classes on Json.org; that unofficial distribution had 10,821 downloads on March 14, 2014.  Download statistics for Jackson isn’t published as far as I can find.  For JSR353, it’s not possible to identify download statistics; it’s not clear that everybody who downloads a Java EE application server with JSR 353 support in it will use that section of the container for JSON processing; they could use either GSon, Jackson, or some other library.

Performance

The performance test was identical for all products.  The test consisted of starting with a complex value object with data and using that object to produce JSON-formatted data.  That JSON formatted data was also read and marshaled into the value objects.  This is a very common usage scenario.  Source for the performance evaluation can be downloaded from here.

Memory was measured by noting consumed memory (total memory less free memory) both before and after the test.  The test was run for 100,000 iterations to make time and memory usage more noticeable.  Results are in the table below.

The JSR353 reference implementation is at the extreme; it was a lot slower than the other two libraries, but appears to have a much smaller footprint.  It’s possible that the JSR test took so long that there was a garbage collection during the test, which means that the memory measurement is understated.

Comparing Gson and Jackson, Gson was faster and had a smaller memory footprint for reading Json data and marshalling that data into value objects.  Jackson was faster and had a smaller memory footprint producing Json data from value objects.

Versioning Support

Gson versioning support uses annotations.   When coding value objects that will correspond to the JSON data read or produced, you use the @Since annotation t record the version in which the field appeared.  When you instantiate Gson to read or produce JSON data, you have the option to specify the version of JSON that will be used.  Any fields from later versions will be ignored.

None of the other products appear to have versioning support.

License.

All three viable product options have licenses that are acceptable to most organizations.  Gson uses Apache 2.0.  Jackson’s license appears to have changed over time; Jackson was LGPL up to version 2.1 and an Apache license thereafter.  Jackson does not specify clearly which version of the Apache license it uses.  Furthermore, its distribution method doesn’t place a copy of the license in what you download.

Concluding Thoughts

Either Gson or Jackson are reasonable choices.  Both products are very easy to use for common use cases and have options for customized serialization/deserialization logic if needed; for most uses custom logic won’t be needed.

My choice for my upcoming project is Gson.  I’m lured by the superior documentation and versioning support.

I hope you’ve found this entry helpful.  Thanks for taking time to read it.

References