Monday, March 23, 2015

Using Docker to deploy Java Web Applications

Docker has fast become a favorite deployment format.  For those of you who haven't used Docker, Docker is "package once, run anywhere".  Docker does for software what shipping containers do for tangible goods.  It has standard connection mechanisms for disk and networking.  Additionally, it's easy to link Docker 'containers' together (eg link database Docker container with your application).

Advantages and Disadvantages

Not only is it a win for system administrators as it makes deployment management easier and more easily automated, but it's a win for Java developers and architects too.  As the applications environment is portable and is packaged with your application, environment-specific problems and defects are greatly reduced. What runs in production is more like what you test with.  In addition since your application runs with its own version of java and the operating system, you can upgrade your JDK and any native code you rely on with less fear of impacting other applications or even involving system administrators to make the native software change. 

Getting past the hype, there are costs and disadvantages to using Docker.  Running Docker and its associated tools is much easier on Linux; Windows has limited support.  This fact creates inconveniences in most corporate environments where most development occurs on Windows.  Yes, there is boot2docker, but that solution is far from complete or convenient.  Platform differences also present issues adding maven functionality to produce docker images; maven builds are supposed to be platform independent.  Additionally, Docker does have a learning curve.

Some may fear performance issues with an additional virtualization layer.  IBM did a good study showing the performance impact to be negligible.   All in all, the advantages of using Docker outweigh the costs.

A Java EE / Docker Example

As it turns out, producing a deployable docker image for Java Web application is easy.  I leverage a packaging product such as Dropwizard or Spring Boot to make such deployments even easier.  That said, Tomcat and many other containers have official Docker images available, so this step isn't strictly required.

As an example, I'll use a microservice I'm writing for a series of presentations called Moneta, named after the Greek goddess of memory.  Moneta provides a Restful web service interface to selected portions of a relational database.

Artifacts for the Moneta product include a Dropwizard deployment for which I've defined a Docker image, which I've open sourced (here).  I'll briefly go through the example as you might find it useful.

Sample Docker File

FROM java:7-jre

MAINTAINER Derek C. Ashmore

#  External volume definitions
RUN mkdir /jarlib
VOLUME /jarlib
RUN mkdir /config
VOLUME /config
RUN mkdir /logs
VOLUME /logs

RUN curl -SL "$MONETA_URL" -o moneta-dropwizard.jar

ENV CLASSPATH /config:/jarlib/*.jar:moneta-dropwizard.jar

EXPOSE 8080 8081

ENTRYPOINT ["java", "-classpath", "$CLASSPATH", "-jar", "moneta-dropwizard.jar", "server", "/config/moneta-dropwizard.yaml"]

Docker files are scripts that produce runnable Docker images.  A couple of items to note.  In a Docker file, you expose ports and disk volumes to the outside world.  I've exposed certain ports within the Docker container, but system administrators can map those ports differently when the container is run.  For example, admins might map 8080 to 9125.  In other words, the admins still need to coordinate ports; that is still managed outside Docker.

Additionally, I've created and exposed three file mount points.  Admins map those volumes to physical disk when they run the container.  I've elected to make the product configurable and extensible.  The configuration and even additional jars can be supplied when the container is run.  I've also exposed a file system for logs so they can be processed by a log management application, such as Splunk or Logsplash.

Docker Application Run Example

Bottom line, it's possible to run the same Docker container in multiple contexts.  I can run different instances of Moneta against different databases and can horizontally scale it ad infinitum.

These mappings become evident in the command used to run the Moneta image.

docker run -d -p 8080:8080 -p 8081:8081 \ 
-v /c/Users/TheAshmores/moneta/dropwizard/config:/config \ 
-v /c/Users/TheAshmores/moneta/dropwizard/logs:/logs \ 
-v /c/Users/TheAshmores/moneta/dropwizard/jarlib:/jarlib \ 

There are other possibilities.  Docker recently announced a product called 'Compose' (here) that allows you to specify and configure multi-container applications.  This makes coordination of multiple containers much easier.  A compose discussion is best left for another post.

Thursday, March 12, 2015

Project Estimation Tactics

I frequently asked for advice on how to estimate tasks or projects.  It’s a question architects get asked by project managers as we’re often in the best position to provide a realistic estimate.  Estimation is as much art as science, but there are some things you can do to reduce error over time.  Some may argue that it’s not the architect’s job to estimate; it’s the project manager’s job.  In my experience, it’s common to combine the architect and project manager roles, so the line between the two is blurry in some organizations.  

I typically determine estimates in terms of the number of labor hours or days required.  Calculating delivery dates and working different scenarios (e.g. different numbers of developers, altering the items delivered, etc.) is just math after that.

Tactic 1:  Keep a history log for past estimates

For the projects I manage as well as fill the architect role, I track hours spent on tasks as well as the estimate I provided for them.  For estimates that were largely different than the time actually taken, I do a private post-mortem.  This is valuable feedback that you can use to refine estimates for future projects.  I realize that tracking this information is boring.

Tactic 2:  Look for comparable projects

I price (a.k.a. estimate) projects the same way people price real estate.  When an assessor determines the value of your home, they measure the square footage and take various specifics (e.g. number of bedrooms, bathrooms, etc.) and compare them to recent sales in your area with about the same square footage and features.  They get a range of sales prices back and they use those to derive the value of your home.  We can use the same tactic for technology projects.

If you follow tip one, over time you will have a history record of labor used for different tasks and projects.  You can mine that library for “comparables” for the project or task you are estimating.  For example on a recent assignment, the average amount of work required to create, format, and deploy a new report took approximately 24-36 hours depending on the complexity of the report.  We’re I still at that assignment, that’s the time frame I used for the report components of projects I needed to estimate.  You can apply the same tactic to estimating other types of tasks.

Tactic 3:  Turn the “big” problem into several little problems

We apply this tactic when designing applications all the time.  It can help you with providing estimates as well.  
You might not have a comparable for the entire project.  However, if you separate that project into smaller portions, you might have history for comparable projects/tasks for at least some of the smaller components.  At the very least, you’ll have a better feeling for which portions of your estimate are likely to have the largest variance.

Sunday, March 8, 2015

Preparing Public Maven Repository Releases

As a mature product, it *should* be easier to prepare a product release bundle for Maven artifacts.  I had a hard time, both with a product that used Maven builds and one that did not.  However, the purpose of this blog entry isn’t to complain, it’s to document the process.  To be honest, my motivation to write this blog entry is really for my personal reference rather than public consumption.

Resource Requirements

Sonatype login to submit group creation requests (login available here) and deploy artifacts.
A GPG signer tool (I use Gpg4win).

One-time Workstation Set-up Tasks

Create a public / private key pair.  Instructions using Gpg4win tool Kleopatra here.

Project  Build Set-up (Maven project)

Add creation of the “sources” jar to your maven build.  Maven makes this easy via a plug-in.  Pom addition here:

Add creation of the “javadoc” jar to your maven build.  Maven makes this easy via a plug-in.  Pom addition here:

Execute your Maven build with the repository:bundle-create goal.  This will verify that you’ve provided all needed information for your project (e.g. description, SCM url, project url, etc.) in the pom file.  It will also create an unsigned deployment bundle (which is useless).

Add bundle creation and signing to your maven build.  My sample requires that you define an environment variable GPG_HOME that denotes the location of where you installed your GPG software (example C:\Software\GNU\GnuPG).  I use the ant plugin for Maven to accomplish this.  This really should be easier.  Pom addition here:

     <property environment="env" />
     <fail if="${env.GPG_HOME}" message="GPG_HOME environment variable not defined." />
     <mkdir dir="target/mavenrepo" />
     <copy file="pom.xml"
      tofile="target/mavenrepo/${}-${project.version}.pom" />
     <copy todir="target/mavenrepo">
      <fileset dir="target" includes="*.jar" />

     <exec executable="cmd" dir="target/mavenrepo">
      <env key="PATH" path="${env.GPG_HOME}" />
      <arg line="/c" />
      <arg line="gpg2.exe" />
      <arg line="-ab" />
       line="${}\mavenrepo\${}-${project.version}.pom" />
     <exec executable="cmd" dir="target/mavenrepo">
      <env key="PATH" path="${env.GPG_HOME}" />
      <arg line="/c" />
      <arg line="gpg2.exe" />
      <arg line="-ab" />
       line="${}\mavenrepo\${}-${project.version}.jar" />
     <exec executable="cmd" dir="target/mavenrepo">
      <env key="PATH" path="${env.GPG_HOME}" />
      <arg line="/c" />
      <arg line="gpg2.exe" />
      <arg line="-ab" />
       line="${}\mavenrepo\${}-${project.version}-javadoc.jar" />
     <exec executable="cmd" dir="target/mavenrepo">
      <env key="PATH" path="${env.GPG_HOME}" />
      <arg line="/c" />
      <arg line="gpg2.exe" />
      <arg line="-ab" />
       line="${}\mavenrepo\${}-${project.version}-sources.jar" />
     <jar destfile="target/${}-${project.version}-bundle.jar">
      <fileset dir="target/mavenrepo" includes="*.jar" />
      <fileset dir="target/mavenrepo" includes="*.pom" />
      <fileset dir="target/mavenrepo" includes="*.asc" />
Run you build and produce a signed bundle jar.  Example content of a properly created and sign bundle are as follows:

Submit a ticket on Sonatype (here) to setup your artifact.  An example ticket can be found here.

Manual Artifact Deployment

After successful project setup, you’ll receive an email notification.   Then you can issue deployments.   It’s possible to incorporate deployment of the bundle in your build script, but that’s a battle I haven’t fought; I deploy mine artifacts manually.  This is fine as I don’t need to deploy very often.

Log into Sonatype (here) to manually upload your deployment bundle and release the artifact , select the “Staging Upload” link, and select the “Artifact Bundle” upload mode.

Then you’re off to the races.   My hope is that this process gets *much* easier over time and that this post becomes useless and obsolete.  Please let me know if I can make any of these instructions clearer.

Friday, March 6, 2015

Tracking Multi-Service Transactions with Correlation IDs

One problem that’s arisen with the use of Micro-service architectures is that there’s a need to correlate related transactions.  With micro-services, it’s common for a business action of some type to require multiple service calls.  If one of those service calls fails, sometimes it’s helpful to have context.  That is, information about service calls for that action that preceded the error.

The best way I’ve found for doing this is to use correlation ids.  A correlation id identifies an action (e.g. user action or business process).  For example drawing on recent experience in the higher education field, accepting a student is a complicated process resulting in several service calls.  In a micro-service world, the process of accepting a student will likely result in several service calls. Should an acceptance fail, having contextual information about each of those service calls might be useful.

It turns out that product support for this concept in the Java EE world is lacking.  Spring boot support for this has been requested (ticket is here), but isn’t a reality yet.   Needing a product to implement correlation id tracking, I elected to write one and posted it on my GitHub account (here).  

The objectives for a product like this are:
  • Ensure that a correlation id is assigned for each transaction (will be generated if not on service request).
  • Ensure that the correlation id is documented on all log messages so that it can be correlated across services.
  • Insulate micro-service code from having to be concerned with correlation ids at all.

I’m using the tactic of documenting the correlation id in request headers.  I use a servlet filter to interrogate the header and see if a correlation id is already defined.  If not, I generate a unique one.  I then add the correlation id to a RequestCorrelationContext class that stores the correlation id in a ThreadLocal manner.  This way, it’s accessible to any class that needs it.

Yes, you must make certain to put the correlation id on the header for any services called for a given transaction.  A similar problem exists for AMQP message producers.  Putting thought into how the architecture can help with that issue is a problem for another day.  

Yes, if transactions are started by batch job classes or AMQP message receivers, the correlation ids would need to be similarly set there as well.

I’ve also written custom enhancements for the Log4J and Logback products that allow the logging pattern to include the correlation id.  That way, any logging messages can be associated with other service calls (provided they also log the correlation ids with any log messages).  Tackling the logging problem at this level means that no micro-service code needs to be involved in the tracking of correlation ids.  Installing this feature should be a one-time setup task.

Log4J V1.x example

With log4j, correlation id tracking can be implemented by including the Log4J companion jar for the product in your classpath and configuring the log4j configuration.  Note that I’ve specified a custom layout which allows me to use the ‘%I’ pattern marker that will determine the placement of the correlation id.

log4j.rootLogger=INFO, stdout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %I %-5p %c{1}:%L - %m%n

The correlation id ‘testId’ then comes out with any logging message.

2015-03-06 15:17:55 testId INFO  CorrelationPatternLayoutTest:49 - Hi there!

Logback example

With Logback, correlation id tracking can be implemented by including the Logback companion jar for the product in your classpath and configuring the logback configuration.  Note that I’ve specified a custom layout encoder that allows me to specify the ‘%id’ pattern marker that will indicate the placement of the correlation id.

 <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="org.force66.correlate.logback.CorrelationPatternLayoutEncoder">
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{5} %id - %msg%n</pattern>

The correlation id ‘testId’ then comes out with any logging message.

15:22:11.024 [main] INFO  o.f.c.l.CorrelationPatternLayoutEncoderTest testId - Hi there!

Sunday, March 1, 2015

The Difference between Architecture and Design

Architecture is more about 'what' is being built (e.g. interface definitions, which apps and modules will be developed or used and what they are responsible for).  Design is more code level and is more about 'how'.

Objectives for the architect:

  •   Enhance technology support for business change.
  •   Optimize developer productivity.
  •   Minimize support infrastructure needed.
  •   Optimize development throughput.
  •   Protect value for technology investments (includes risk mitigation).

Architects aren't the only people changed with these items. However, technology choices and strategies are a large determining factor in achieving these objectives.  

Architects may be uncomfortable with the business responsibilities I list.  This is unfortunate. Architecture, like everything else in business, needs to at least pay for itself if not contribute value to the organization.  If it does not, it will become extinct.

Architects achieve these objectives by employing the following tactics:
  •   Facilitate product selection (including build vs buy and support/testing tools).
  •   Module identification, boundaries and contracts.
  •   Effective communication of the above to developers and management.
  •   Enforcement (with management support) of architecture decisions.

Objectives of senior developers (aka designers) are the following:
  •   Mentor junior developers.
  •   Produce easily supportable code.
  •   Identify and communicate implementation limits and risks.
  •   Identify and communicate opportunities for enhancing the architecture to optimize development and support.
  •   Identify and communicate architecture gaps and inconsistencies.
  •   Identify and communicate code that isn't easily supportable to management.

Many java architects do a mixture of design and architecture.  Consequently, the two roles are often confused.   Also, some enterprises aren’t large enough to formally separate out the architect role.

I’ve noticed that many developers are good at ‘identifying’ architecture issues, gaps, and inconsistencies, but fail at communicating them.   Either they never communicate them or communicate them so poorly that those concerns never affect change.

Do you see any differences between architecture and design that I haven’t listed?

Sunday, February 22, 2015

Continuing Education for Architects

As fast as technology changes, members of the information technology field, no matter what role they fill (developer, architect, project manager, etc.), are effectively in a never-ending continuing education course.  It’s not a formal course and is entirely self-directed.  It doesn’t have formal grades, although your success in the field is effectively the grading system.  Some resist this concept and learn only under duress.  

I’ve received requests my readers in the past for additional recommendations on how to make their learning process for Java EE technologies more effective.  That’s an excellent question.  It’s also admirable as it means they are accepting responsibility for that learning process and are aggressively pursuing it.  

I’ll elaborate on some of the tactics I use to direct my own education; perhaps you might find some of those tactics useful as well. 

1.   Browse open source project source solving similar problems

Besides drawing on past experience, there are other ways.  There are very few new problems in IT these days.  Most of the design problems have already been solved with solutions published.  Given the extremely large number of open source projects with published source, chances are that several of them have solved a problem very close if not identical to yours.

I’ll provide an example.  For a micro-service I’m writing (here) in preparation for an upcoming presentation, I need to solve the problem of supporting SQL generation for multiple relational database types (e.g. Oracle, Microsoft SQL Server, PostgreSQL, MySql, and others).  While there is an ANSI standard for SQL, many database software platforms have idiosyncrasies in their syntax and differ from the standard.  In this case, Hibernate, the ORM product, generates SQL under the covers and supports multiple SQL dialects and I found that section of their source enlightening.

2.   Follow Thought Leaders for your topic on Twitter, Slideshare, and YouTube

You can get an early lead on future IT directions by receiving tweets and presentations from thought leaders on your topic.  Typically, tweets are brief, but you can follow it up with additional research if it appears interesting.

3.   Catalog important nuggets of information you receive.

If you’re like me, you can’t remember everything.  I catalog important nuggets of information I run into using the mind mapping software FreeMind.  It’s open source and free.  It’s essentially an outline on steroids.  You can insert hyperlinks, connect many disparate ideas, expand/collapse any portion of the map, and many other useful things.  A portion of a mind map I’m working on for micro-service architects is below.  

Gerald Weinberg calls this the “fieldstone method”.  He likens finding good nuggets of information to finding stones for a stone wall your constructing.  When you find the stone, you might not know how you’re going to use it (i.e. on which section of wall), but you keep it anyway and catalog it for future use.  Weinberg provides much more detail on this concept in his book.

4.   Contribute to LinkedIn Groups

The LinkedIn has a groups feature where members can post questions and contribute answers.  There are groups for just about any topic of interest.  Yes, there are plenty of useless posts, but you can get some valuable nuggets of information.  Also, it’s a decent test bed for ideas you have.  This is similar to striking up a conversation about a topic with a colleague, but on a much wider scale.

5.   Write Some Code

Pick an open source project of interest to you and contribute to it.  Yes - you should still do this even if you're an architect.  If you’re interested in a product that doesn’t exist yet, write it and post it on GitHub (or one of the other open source project hosting providers).

6.   Write Articles or Give Presentations

Understanding a topic to the point that you can explain it to others forces a level of education beyond the content of your article or presentation.  Also, many people in our field are not nice critics.  If your article gets attention and is crap, you’ll be told that in very short order.

Wednesday, November 5, 2014

Is it a bad practice to catch ‘Exception’?

I recently encountered an organization that religiously enforces the Checkstyle IllegalCatch rule (reference here) and mandates following it for all production code.  This rule seeks to ensure that developers don’t catch Throwable, Exception, or RuntimeException directly. The assertion is that it is always a better practice to catch more specific exceptions.  This rule seems a bit draconian to me, so I did a little surfing to see what others thought.  It seems the question has been debated at length, sometimes heatedly.  In fact, like discussion involving religion or politics, discussion of this issue is sometimes more heated than the strength of the supporting arguments provided.

Common reasons that people answer ‘yes’ to this question and assert that catching Throwable, Exception or RuntimeException is universally a bad practice seem to distill to the following reasons:

  • Developers can’t expect to handle all kinds of exceptions, which is what catching Exception implies (i.e. you’re code is not worthy).
  • It’s possible that classes up the stack will have a better ability to handle the unexpected exception thrown.
  • Detailed exceptions can often provide additional detail about the problem.
  • Developers routinely handle exceptions poorly loosing information needed to diagnose reported bugs or environmental issues.

It is interesting that a common “exception” (pun intended) to this rule given by the skeptical is that application entry points need to have a global catch.  Presumably the reasoning is that logging the exception to standard error (which is what the JVM does with uncaught exceptions by default) isn’t what’s wanted in most cases.

I am one of the ‘skeptics’ and disagree with the global catch prohibition, at least for Exception and RuntimeException, for several reasons.  Let's treat Throwable as a separate case that I'll address later in the entry.

There is no reliable way to identify the list of specific exceptions a developer can ‘expect’ and could be coded in catch clauses.  Most of us leverage third-party products in our code very frequently.  Unless the products you use throw checked exceptions or takes the trouble to document what exceptions it throws in the throws specification for all methods, there’s no way to get a specific list of the exceptions you can reasonably expect from product code.  If you don’t have a list of specific exceptions you can ‘expect’, there’s no reasonable way to code specific catches for those exceptions.

Specific catches violate DRY most of the time.  Most of the time, our response to all types of exceptions is identical.   Either we log the exception or convert it to some type of RuntimeException and re-throw it.  If our response to the most specific exceptions will be identical, coding multiple catches with identical logic just to adhere to the illegal catch rule just creates code bloat.  Yes, as of JDK 1.7, there’s a syntax to help eliminate the code bloat (as illustrated below).  Many of us aren’t on 1.7 in production yet.
JDK 1.7 catch example:
catch (IOException|SQLException ex) {
    throw new RuntimeException(ex);
Pre-JDK 1.7 catch example with DRY violation:
catch (IOException ex) {
catch (SQLException ex) {

The JDK itself is replete with instances of catching Exception.  In looking for best practice guidance for general coding questions such as this, I consult the JDK source itself.  Presumably the people who write the JDK are an authority on how the language was intended to be used.  If general catches were a bad practice, why are JDK authors doing it routinely? Don’t take my word for it.  Look for references to Exception in Eclipse and start auditing the search results.  Instances of global catches are too numerous to list here.  To save you time, MessageFormat (and many other java.text classes) and CompletedFuture (and many other java.util.concurrent classes) have global catch code.  Instances of catching RuntimeException, Error, and Throwable exist but appear to occur less often.

The JDK itself provides a way to supply handling logic for uncaught exceptions (as of JDK 1.5).  Classes implementing Thread.UncaughtExceptionHandler can be associated with any Thread or ThreadGroup.  Should a thread experience an uncaught exception, the JVM will automatically invoke the handler for the thread (or it’s thread group) if it was defined.  This feature can be used to supply logic to log exceptions (to somewhere other than the default standard error) or notify administrators in some other way.  Apart from the syntax difference, this really is a type of global catch.  Why provide this feature if it's a 'bad practice' to use it?  Unfortunately, exceptions logged at this level will have little knowledge of the context surrounding why the exception was generated.

Incidentally, it's not uncommon to farm global catches out to products that execute that global catch for us.  I'm thinking of Spring interceptors and handler adapters that are often configured to effectively issue global catches on our behalf and escort users to a standard error page.  Isn't this type of practice effectively a global catch, just with a different syntax? Isn't it hypocritical to configure products to issue global catches, but deliberately prohibit global catches directly?

Global catches combined with adding runtime context information has saved me boat-loads of time. In many cases, handling an exception either means logging that exception or recasting the exception in some way and including a meaningful context that might be useful to developers fixing the issue.  I routinely use ContextedRuntimeException from the Apache Commons Lang product for this purpose.  I’ve provided an illustration of this concept below (from the Apache Doc).

   try {
   } catch (Exception e) {
     throw new ContextedRuntimeException("Error posting account transaction", e)
          .addContextValue("Account Number", accountNumber)
          .addContextValue("Amount Posted", amountPosted)
          .addContextValue("Previous Balance", previousBalance)

The stack trace output for the ContextedRuntimeException will report context information as well as the root cause.  For most reported exceptions using this technique, I’ve enough information to diagnose the bug or at least replicate the bug most of the time.  This technique *only* works when the exception and context are caught close to where the exception was generated. Also, it's important to record the original exception as a cause as it may have additional information useful to solving the issue.  Letting the JVM log the exception to standard error by default will not capture the information about the problem being captured here. Yes, it is possible that the additional information captured might not be valuable.  It is better to 'have' and not 'need' then the reverse.

The assertion that developers sometimes handle exception logic (code within a catch block) poorly and swallow information needed to diagnose problems is a fair point.  However, developers can insert poor exception handling logic for specific exceptions as well general exceptions.  Prohibiting global catches in all cases does not solve poor exception handling coding issues.  Best practices for exception handling logic is a topic that deserves more in depth treatment, perhaps in another blog entry.

There are reasons to catch specific exceptions.  Sometimes specific exceptions have additional information that should be captured.  Some exceptions represent business process errors or user errors and not unintended application exceptions. I'm not trying to advise that developers should catch Exception exclusively.  I do believe, it should be an option and not outright prohibited.

I would like to see two JDK improvements with exception handling.  Java needs a syntax to list “exceptions” for a global catch.  For instance, if I could easily code a catch that specified that I would like to catch all exceptions except instances of VirtualMachineError (out of memory conditions, JVM internal errors, etc.), the illegal catch rule would be easier to adhere to.  For example, I'd like to be able to specify a catch clause like:
catch all except(VirtualMachineError|LinkageError ex) {

It would also help if it were possible to specify a default Thread.UncaughtExceptionHandler at JVM startup and supply notification logic for memory conditions and low-level virtual machine errors.  The default behavior of logging to standard error isn't wanted in many cases.  Note that it is possible to set the default exception handler for the entire JVM via Thread.setDefaultUncaughtExceptionHandler().