Friday, March 6, 2015

Tracking Multi-Service Transactions with Correlation IDs

One problem that’s arisen with the use of Micro-service architectures is that there’s a need to correlate related transactions.  With micro-services, it’s common for a business action of some type to require multiple service calls.  If one of those service calls fails, sometimes it’s helpful to have context.  That is, information about service calls for that action that preceded the error.

The best way I’ve found for doing this is to use correlation ids.  A correlation id identifies an action (e.g. user action or business process).  For example drawing on recent experience in the higher education field, accepting a student is a complicated process resulting in several service calls.  In a micro-service world, the process of accepting a student will likely result in several service calls. Should an acceptance fail, having contextual information about each of those service calls might be useful.

It turns out that product support for this concept in the Java EE world is lacking.  Spring boot support for this has been requested (ticket is here), but isn’t a reality yet.   Needing a product to implement correlation id tracking, I elected to write one and posted it on my GitHub account (here).  

The objectives for a product like this are:
  • Ensure that a correlation id is assigned for each transaction (will be generated if not on service request).
  • Ensure that the correlation id is documented on all log messages so that it can be correlated across services.
  • Insulate micro-service code from having to be concerned with correlation ids at all.

I’m using the tactic of documenting the correlation id in request headers.  I use a servlet filter to interrogate the header and see if a correlation id is already defined.  If not, I generate a unique one.  I then add the correlation id to a RequestCorrelationContext class that stores the correlation id in a ThreadLocal manner.  This way, it’s accessible to any class that needs it.

Yes, you must make certain to put the correlation id on the header for any services called for a given transaction.  A similar problem exists for AMQP message producers.  Putting thought into how the architecture can help with that issue is a problem for another day.  

Yes, if transactions are started by batch job classes or AMQP message receivers, the correlation ids would need to be similarly set there as well.

I’ve also written custom enhancements for the Log4J and Logback products that allow the logging pattern to include the correlation id.  That way, any logging messages can be associated with other service calls (provided they also log the correlation ids with any log messages).  Tackling the logging problem at this level means that no micro-service code needs to be involved in the tracking of correlation ids.  Installing this feature should be a one-time setup task.

Log4J V1.x example

With log4j, correlation id tracking can be implemented by including the Log4J companion jar for the product in your classpath and configuring the log4j configuration.  Note that I’ve specified a custom layout which allows me to use the ‘%I’ pattern marker that will determine the placement of the correlation id.

log4j.rootLogger=INFO, stdout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %I %-5p %c{1}:%L - %m%n

The correlation id ‘testId’ then comes out with any logging message.

2015-03-06 15:17:55 testId INFO  CorrelationPatternLayoutTest:49 - Hi there!

Logback example

With Logback, correlation id tracking can be implemented by including the Logback companion jar for the product in your classpath and configuring the logback configuration.  Note that I’ve specified a custom layout encoder that allows me to specify the ‘%id’ pattern marker that will indicate the placement of the correlation id.

 <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="org.force66.correlate.logback.CorrelationPatternLayoutEncoder">
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{5} %id - %msg%n</pattern>

The correlation id ‘testId’ then comes out with any logging message.

15:22:11.024 [main] INFO  o.f.c.l.CorrelationPatternLayoutEncoderTest testId - Hi there!

1 comment:

  1. The images are very helpful to understand this function. I got many useful material about multi service transactions from this post. Android event apps