Monday, March 23, 2015

Using Docker to deploy Java Web Applications

Docker has fast become a favorite deployment format.  For those of you who haven't used Docker, Docker is "package once, run anywhere".  Docker does for software what shipping containers do for tangible goods.  It has standard connection mechanisms for disk and networking.  Additionally, it's easy to link Docker 'containers' together (eg link database Docker container with your application).

Advantages and Disadvantages

Not only is it a win for system administrators as it makes deployment management easier and more easily automated, but it's a win for Java developers and architects too.  As the applications environment is portable and is packaged with your application, environment-specific problems and defects are greatly reduced. What runs in production is more like what you test with.  In addition since your application runs with its own version of java and the operating system, you can upgrade your JDK and any native code you rely on with less fear of impacting other applications or even involving system administrators to make the native software change. 

Getting past the hype, there are costs and disadvantages to using Docker.  Running Docker and its associated tools is much easier on Linux; Windows has limited support.  This fact creates inconveniences in most corporate environments where most development occurs on Windows.  Yes, there is boot2docker, but that solution is far from complete or convenient.  Platform differences also present issues adding maven functionality to produce docker images; maven builds are supposed to be platform independent.  Additionally, Docker does have a learning curve.

Some may fear performance issues with an additional virtualization layer.  IBM did a good study showing the performance impact to be negligible.   All in all, the advantages of using Docker outweigh the costs.

A Java EE / Docker Example

As it turns out, producing a deployable docker image for Java Web application is easy.  I leverage a packaging product such as Dropwizard or Spring Boot to make such deployments even easier.  That said, Tomcat and many other containers have official Docker images available, so this step isn't strictly required.

As an example, I'll use a microservice I'm writing for a series of presentations called Moneta, named after the Greek goddess of memory.  Moneta provides a Restful web service interface to selected portions of a relational database.

Artifacts for the Moneta product include a Dropwizard deployment for which I've defined a Docker image, which I've open sourced (here).  I'll briefly go through the example as you might find it useful.

Sample Docker File

FROM java:7-jre

MAINTAINER Derek C. Ashmore

#  External volume definitions
RUN mkdir /jarlib
VOLUME /jarlib
RUN mkdir /config
VOLUME /config
RUN mkdir /logs
VOLUME /logs

RUN curl -SL "$MONETA_URL" -o moneta-dropwizard.jar

ENV CLASSPATH /config:/jarlib/*.jar:moneta-dropwizard.jar

EXPOSE 8080 8081

ENTRYPOINT ["java", "-classpath", "$CLASSPATH", "-jar", "moneta-dropwizard.jar", "server", "/config/moneta-dropwizard.yaml"]

Docker files are scripts that produce runnable Docker images.  A couple of items to note.  In a Docker file, you expose ports and disk volumes to the outside world.  I've exposed certain ports within the Docker container, but system administrators can map those ports differently when the container is run.  For example, admins might map 8080 to 9125.  In other words, the admins still need to coordinate ports; that is still managed outside Docker.

Additionally, I've created and exposed three file mount points.  Admins map those volumes to physical disk when they run the container.  I've elected to make the product configurable and extensible.  The configuration and even additional jars can be supplied when the container is run.  I've also exposed a file system for logs so they can be processed by a log management application, such as Splunk or Logsplash.

Docker Application Run Example

Bottom line, it's possible to run the same Docker container in multiple contexts.  I can run different instances of Moneta against different databases and can horizontally scale it ad infinitum.

These mappings become evident in the command used to run the Moneta image.

docker run -d -p 8080:8080 -p 8081:8081 \ 
-v /c/Users/TheAshmores/moneta/dropwizard/config:/config \ 
-v /c/Users/TheAshmores/moneta/dropwizard/logs:/logs \ 
-v /c/Users/TheAshmores/moneta/dropwizard/jarlib:/jarlib \ 

There are other possibilities.  Docker recently announced a product called 'Compose' (here) that allows you to specify and configure multi-container applications.  This makes coordination of multiple containers much easier.  A compose discussion is best left for another post.

Thursday, March 12, 2015

Project Estimation Tactics

I frequently asked for advice on how to estimate tasks or projects.  It’s a question architects get asked by project managers as we’re often in the best position to provide a realistic estimate.  Estimation is as much art as science, but there are some things you can do to reduce error over time.  Some may argue that it’s not the architect’s job to estimate; it’s the project manager’s job.  In my experience, it’s common to combine the architect and project manager roles, so the line between the two is blurry in some organizations.  

I typically determine estimates in terms of the number of labor hours or days required.  Calculating delivery dates and working different scenarios (e.g. different numbers of developers, altering the items delivered, etc.) is just math after that.

Tactic 1:  Keep a history log for past estimates

For the projects I manage as well as fill the architect role, I track hours spent on tasks as well as the estimate I provided for them.  For estimates that were largely different than the time actually taken, I do a private post-mortem.  This is valuable feedback that you can use to refine estimates for future projects.  I realize that tracking this information is boring.

Tactic 2:  Look for comparable projects

I price (a.k.a. estimate) projects the same way people price real estate.  When an assessor determines the value of your home, they measure the square footage and take various specifics (e.g. number of bedrooms, bathrooms, etc.) and compare them to recent sales in your area with about the same square footage and features.  They get a range of sales prices back and they use those to derive the value of your home.  We can use the same tactic for technology projects.

If you follow tip one, over time you will have a history record of labor used for different tasks and projects.  You can mine that library for “comparables” for the project or task you are estimating.  For example on a recent assignment, the average amount of work required to create, format, and deploy a new report took approximately 24-36 hours depending on the complexity of the report.  We’re I still at that assignment, that’s the time frame I used for the report components of projects I needed to estimate.  You can apply the same tactic to estimating other types of tasks.

Tactic 3:  Turn the “big” problem into several little problems

We apply this tactic when designing applications all the time.  It can help you with providing estimates as well.  
You might not have a comparable for the entire project.  However, if you separate that project into smaller portions, you might have history for comparable projects/tasks for at least some of the smaller components.  At the very least, you’ll have a better feeling for which portions of your estimate are likely to have the largest variance.

Sunday, March 8, 2015

Preparing Public Maven Repository Releases

As a mature product, it *should* be easier to prepare a product release bundle for Maven artifacts.  I had a hard time, both with a product that used Maven builds and one that did not.  However, the purpose of this blog entry isn’t to complain, it’s to document the process.  To be honest, my motivation to write this blog entry is really for my personal reference rather than public consumption.

Resource Requirements

Sonatype login to submit group creation requests (login available here) and deploy artifacts.
A GPG signer tool (I use Gpg4win).

One-time Workstation Set-up Tasks

Create a public / private key pair.  Instructions using Gpg4win tool Kleopatra here.

Command to create a public/private pair with no password (not needed here) are as follows. Just follow the prompts and enter a blank password.

gpg --gen-key
Commands to list keys you already have:
gpg2 --list-keys
gpg2 --list-secret-keys
Commands to delete existing keys should you need them. Note, you must delete the secret key before the public key.
gpg --delete-secret-key myKeyId
gpg --delete-key myKeyId

Project  Build Set-up (Maven project)

Add creation of the “sources” jar to your maven build.  Maven makes this easy via a plug-in.  Pom addition here:

Add creation of the “javadoc” jar to your maven build.  Maven makes this easy via a plug-in.  Pom addition here:

Execute your Maven build with the repository:bundle-create goal.  This will verify that you’ve provided all needed information for your project (e.g. description, SCM url, project url, etc.) in the pom file.  It will also create an unsigned deployment bundle (which is useless).

Add bundle creation and signing to your maven build.  My sample requires that you define an environment variable GPG_HOME that denotes the location of where you installed your GPG software (example C:\Software\GNU\GnuPG).  I use the ant plugin for Maven to accomplish this.  This really should be easier.  Pom addition here:

     <property environment="env" />
     <fail if="${env.GPG_HOME}" message="GPG_HOME environment variable not defined." />
     <mkdir dir="target/mavenrepo" />
     <copy file="pom.xml"
      tofile="target/mavenrepo/${}-${project.version}.pom" />
     <copy todir="target/mavenrepo">
      <fileset dir="target" includes="*.jar" />

     <exec executable="cmd" dir="target/mavenrepo">
      <env key="PATH" path="${env.GPG_HOME}" />
      <arg line="/c" />
      <arg line="gpg2.exe" />
      <arg line="-ab" />
       line="${}\mavenrepo\${}-${project.version}.pom" />
     <exec executable="cmd" dir="target/mavenrepo">
      <env key="PATH" path="${env.GPG_HOME}" />
      <arg line="/c" />
      <arg line="gpg2.exe" />
      <arg line="-ab" />
       line="${}\mavenrepo\${}-${project.version}.jar" />
     <exec executable="cmd" dir="target/mavenrepo">
      <env key="PATH" path="${env.GPG_HOME}" />
      <arg line="/c" />
      <arg line="gpg2.exe" />
      <arg line="-ab" />
       line="${}\mavenrepo\${}-${project.version}-javadoc.jar" />
     <exec executable="cmd" dir="target/mavenrepo">
      <env key="PATH" path="${env.GPG_HOME}" />
      <arg line="/c" />
      <arg line="gpg2.exe" />
      <arg line="-ab" />
       line="${}\mavenrepo\${}-${project.version}-sources.jar" />
     <jar destfile="target/${}-${project.version}-bundle.jar">
      <fileset dir="target/mavenrepo" includes="*.jar" />
      <fileset dir="target/mavenrepo" includes="*.pom" />
      <fileset dir="target/mavenrepo" includes="*.asc" />
Run you build and produce a signed bundle jar.  Example content of a properly created and sign bundle are as follows:

Submit a ticket on Sonatype (here) to setup your artifact.  An example ticket can be found here.

Manual Artifact Deployment

After successful project setup, you’ll receive an email notification.   Then you can issue deployments.   It’s possible to incorporate deployment of the bundle in your build script, but that’s a battle I haven’t fought; I deploy mine artifacts manually.  This is fine as I don’t need to deploy very often.

Log into Sonatype (here) to manually upload your deployment bundle and release the artifact , select the “Staging Upload” link, and select the “Artifact Bundle” upload mode.

Then you’re off to the races.   My hope is that this process gets *much* easier over time and that this post becomes useless and obsolete.  Please let me know if I can make any of these instructions clearer.

Friday, March 6, 2015

Tracking Multi-Service Transactions with Correlation IDs

One problem that’s arisen with the use of Micro-service architectures is that there’s a need to correlate related transactions.  With micro-services, it’s common for a business action of some type to require multiple service calls.  If one of those service calls fails, sometimes it’s helpful to have context.  That is, information about service calls for that action that preceded the error.

The best way I’ve found for doing this is to use correlation ids.  A correlation id identifies an action (e.g. user action or business process).  For example drawing on recent experience in the higher education field, accepting a student is a complicated process resulting in several service calls.  In a micro-service world, the process of accepting a student will likely result in several service calls. Should an acceptance fail, having contextual information about each of those service calls might be useful.

It turns out that product support for this concept in the Java EE world is lacking.  Spring boot support for this has been requested (ticket is here), but isn’t a reality yet.   Needing a product to implement correlation id tracking, I elected to write one and posted it on my GitHub account (here).  

The objectives for a product like this are:
  • Ensure that a correlation id is assigned for each transaction (will be generated if not on service request).
  • Ensure that the correlation id is documented on all log messages so that it can be correlated across services.
  • Insulate micro-service code from having to be concerned with correlation ids at all.

I’m using the tactic of documenting the correlation id in request headers.  I use a servlet filter to interrogate the header and see if a correlation id is already defined.  If not, I generate a unique one.  I then add the correlation id to a RequestCorrelationContext class that stores the correlation id in a ThreadLocal manner.  This way, it’s accessible to any class that needs it.

Yes, you must make certain to put the correlation id on the header for any services called for a given transaction.  A similar problem exists for AMQP message producers.  Putting thought into how the architecture can help with that issue is a problem for another day.  

Yes, if transactions are started by batch job classes or AMQP message receivers, the correlation ids would need to be similarly set there as well.

I’ve also written custom enhancements for the Log4J and Logback products that allow the logging pattern to include the correlation id.  That way, any logging messages can be associated with other service calls (provided they also log the correlation ids with any log messages).  Tackling the logging problem at this level means that no micro-service code needs to be involved in the tracking of correlation ids.  Installing this feature should be a one-time setup task.

Log4J V1.x example

With log4j, correlation id tracking can be implemented by including the Log4J companion jar for the product in your classpath and configuring the log4j configuration.  Note that I’ve specified a custom layout which allows me to use the ‘%I’ pattern marker that will determine the placement of the correlation id.

log4j.rootLogger=INFO, stdout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %I %-5p %c{1}:%L - %m%n

The correlation id ‘testId’ then comes out with any logging message.

2015-03-06 15:17:55 testId INFO  CorrelationPatternLayoutTest:49 - Hi there!

Logback example

With Logback, correlation id tracking can be implemented by including the Logback companion jar for the product in your classpath and configuring the logback configuration.  Note that I’ve specified a custom layout encoder that allows me to specify the ‘%id’ pattern marker that will indicate the placement of the correlation id.

 <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="org.force66.correlate.logback.CorrelationPatternLayoutEncoder">
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{5} %id - %msg%n</pattern>

The correlation id ‘testId’ then comes out with any logging message.

15:22:11.024 [main] INFO  o.f.c.l.CorrelationPatternLayoutEncoderTest testId - Hi there!

Sunday, March 1, 2015

The Difference between Architecture and Design

Architecture is more about 'what' is being built (e.g. interface definitions, which apps and modules will be developed or used and what they are responsible for).  Design is more code level and is more about 'how'.

Objectives for the architect:

  •   Enhance technology support for business change.
  •   Optimize developer productivity.
  •   Minimize support infrastructure needed.
  •   Optimize development throughput.
  •   Protect value for technology investments (includes risk mitigation).

Architects aren't the only people changed with these items. However, technology choices and strategies are a large determining factor in achieving these objectives.  

Architects may be uncomfortable with the business responsibilities I list.  This is unfortunate. Architecture, like everything else in business, needs to at least pay for itself if not contribute value to the organization.  If it does not, it will become extinct.

Architects achieve these objectives by employing the following tactics:
  •   Facilitate product selection (including build vs buy and support/testing tools).
  •   Module identification, boundaries and contracts.
  •   Effective communication of the above to developers and management.
  •   Enforcement (with management support) of architecture decisions.

Objectives of senior developers (aka designers) are the following:
  •   Mentor junior developers.
  •   Produce easily supportable code.
  •   Identify and communicate implementation limits and risks.
  •   Identify and communicate opportunities for enhancing the architecture to optimize development and support.
  •   Identify and communicate architecture gaps and inconsistencies.
  •   Identify and communicate code that isn't easily supportable to management.

Many java architects do a mixture of design and architecture.  Consequently, the two roles are often confused.   Also, some enterprises aren’t large enough to formally separate out the architect role.

I’ve noticed that many developers are good at ‘identifying’ architecture issues, gaps, and inconsistencies, but fail at communicating them.   Either they never communicate them or communicate them so poorly that those concerns never affect change.

Do you see any differences between architecture and design that I haven’t listed?