Wednesday, January 27, 2016

Considerations for adopting Canary releases

A canary release is a tactic for reducing deployment risk. The idea is to deploy a new release to one or two nodes in a service cluster, let them handle some portion of the work load, and see if any unexpected errors result. In effect, this is a kind of "testing" in production. The term comes from the practice of miners to bring canaries with them into the mines; if the canaries died, then the air wasn't safe to breathe and they should evacuate. Like all strategies, there are advantages and disadvantages to using this approach. This blog entry attempts to outline the more important considerations.

The reason this is attractive is that it mitigates risk without slowing down developer velocity as other forms of risk mitigation (e.g. additional testing before deployment) does. If you are adopting continuous delivery where there are far more deployments and changes are deployed at a greatly increased rate, risk mitigation techniques like this are attractive.



Canary deployments seek to mitigate risk as you'll have less impact should a defect accidentally be deployed. Fewer customers or end-users will be impacted by unintended defects making it to production as the portion of the load a canary deployment handles is a fraction of the total. For example, if the canary deployment only handles 2 percent of the load, that drastically reduces the impact of an unintended defect when compared to if the new release handled 100% of the load..

One problem is how to evaluate unintended defects. Logged exceptions, where the defect results in an exception, are easy to defect. Silent defects are defects that don't result in an exception (e.g. produce an incorrect answer or incorrect data). They are harder to catch and in fact might result in a derivative error in some other service that processes incorrect data. In fact, it might not be caught until an end user or customer complains. 

Canary deployments do not effectively mitigate the risk of silent defects. The reason is that they are often not caught until far after the defect occurs. Furthermore, they often are detected manually, which introduces a time lag in and of itself.

Canary deployment capabilities increase complexity associated with your automated deployment mechanism. There are a few strategies that can be used to manage canary deployments. The available strategies are a comp[lex enough subject that they should be addressed in a separate blog post. All of these strategies introduce additional complexity to automated deployments and back-outs.

Canary deployments require the ability to measure and compare metrics of the canary to metrics of other nodes in the cluster. To those operating at a larger scale, this metric comparison between the canary and baseline is automated. Those metrics differing substantially compared to baseline are automatically rolled back.

Some use canary deployments can be utilized instead of capacity testing. The idea is that load is transferred gradually from one release to another enabling administrators to spot resource consumption issues before 100% of the load is transferred over. With this idea, all deployments are canary deployments; it's just that the deployment happens gradually over a longer period of time.

Database changes can present a problem for canary deployments. If the database changes in a way that's not backward compatible. As an example, if tables and or columns are removed. This can be a large topic and should really be addressed in a separate blog post. 

Canary deployments are only possible in cases where there are no service contract changes. In an age where we're breaking up monolithic applications into larger number of smaller deployables, not every release will be a candidate for a canary deployment. Those deployments that introduce a contract breaking change will almost certainly require an all-or-nothing deployment of both the producing service and all consuming services.





Saturday, January 16, 2016

Exception Throwing Etiquette

Nothing irritates me more than investigating reported exceptions where the developer didn't take the time to record the context with the exception.  That is, you get an error message like "Field XXX value not valid" or something like that.  What value was provided that was invalid?  What are the valid values for that field?  What were the other arguments to the method being thrown so that it's quick and easy to construct a test case for fixing the bug?  

These are all items of information that should have been reported with the exception.  Had they been reported with the exception, the developer assigned would have likely spent much less time diagnosing the issue and providing a fix.


You might ask why application architects should be concerned with code-level items such as these.  The reason is that architects are partly responsible for designing applications that are easy to support and help the organization keep support costs at bay.  Promoting and enforcing good exception etiquette is a cheap way to do that. 

Exceptions should contain context information to facilitate support.  This speeds support and problem resolution because it saves developers time.  That time savings allows developers to spend more time on feature development rather than support.  If your enterprise separates development and support, it allows the organization to spend less on support entirely.

Providing context with exceptions is easy for developers to do now.  Tooling exists now that makes this easy and safe.  I think the reason that developers don't take the time to include needed information with exceptions is laziness.  I'm going to present three options I see used most frequently.  That said, I'm sure that there are other tooling options.

Don't write custom formatting code for exceptions.  We've all see the issue of the exception reported was a null pointer from code that was in the process of formatting an exception message.  In other words, the exception that got reported was derivative; not the root exception that needs to be diagnosed.  The tooling options I'll tell you about reduce the risk of including context information with your exceptions as well as making it easier.

Apache Commons Lang Validation


A common reason for throwing exceptions is to validate arguments on public and protected methods.  I'll save a discussion on why all arguments for public and protected methods should be validated for another post. One of my favorite tools for validating arguments is the Validate utility from Apache Commons Lang.  Validate makes it very easy to both perform the validation and include context information along with the exception.  Examples follow which should make usage obvious.

Validate.notEmpty(firstName, "firstName can't be null or blank.  accountId=%s, 
    accountType=%s", accountId, accountType);

Validate.notNull(accountType, "accountType is required.  Allowed values are %s", 
    ArrayUtils.toString(AccountType.values()));

Validate.isTrue(startingBalance >= minimumBalance, "Starting balance must be larger than %s.  balance=%s, accountId=%s, accountType=%s", 
    minimumBalance, startingBalance, accountId, accountType);

Exception messages produced by the above examples are:


java.lang.NullPointerException: firstName can't be null or blank.  accountId=12345, accountType=PERSONAL

java.lang.NullPointerException: accountType is required.  Allowed values are {PERSONAL,BUSINESS}

java.lang.IllegalArgumentException: Starting balance must be larger than 1000.0.  balance=100.0, accountId=12345, accountType=PERSONAL


Validate is null safe, so you don't need to worry about derivative null pointer exceptions.

Some developers prefer the Javax Validator framework as it's more annotation based.  I provide information on that later in the post.

Apache Commons Lang Contexted Exceptions


Exceptions that are not the result of a validation, but a part of something more complex, such as a remote call, I use the ContextedRuntimeException from Apache Commons Lang.  Note that there is a checked ContextedException that operates exactly the same way.

The idea is that when you create the exception you also add context values that will automatically be present in the log when the exception is reported.  the exception is null safe, so you don't need to worry about passing it null context values.  An example should make usage obvious:


throw new ContextedRuntimeException("Error adding customer", rEx)
    .addContextValue("accountId", accountId)
    .addContextValue("accountType", accountType)
    .addContextValue("customerName", customerName);


When the exception is logged, you'll see something like this:



org.apache.commons.lang3.exception.ContextedRuntimeException: Error adding customer
Exception Context:
 [1:accountId=12345]
 [2:accountType=PERSONAL]
 [3:customerName=null]
---------------------------------
 at examples.commons.lang.context.ContextedExceptionTest.test(ContextedExceptionTest.java:18)

Javax Validator Framework


The Java Validator framework is popular in some organizations.  It's annotation based and is handy for validating inputs for published services automatically.  One of it's advantages is that since it's annotation based, it doesn't add code to the code base that needs to be tested.

One of the disadvantages to the Javax Validator framework is that it's much more difficult to include context information along with the exception thrown.  Yes, developers can provide explicit messages that will be used when the validation fails.  An example of including such a message follows:


public class CustomerAccount {
    @NotBlank(message="accountId can't be blank")
    private String accountId;
  
    @NotNull(message="accountType can't be blank")
    private AccountType accountType;
  
    @NotBlank(message="customerName can't be blank")
    private String customerName;
  
    @DecimalMin(value="1000.00", message="startingBalance must be at least $1000")
    private BigDecimal startingBalance;
}

Notice that in the examples above, all messages are explicit and hardcoded.  There's no easy ability to add in additional context information about other fields in the bean being validated so that a developer could get additional context to assist with diagnosing the issue.  Note that you also don't get the invalid value that violated the exception.

This presents an interesting question.  Would it be possible to use the Javax Validator framework and provide context information?  At first glance, you would have to write a custom class that implemented the MessageInterpolator interface (details here) and bootstrap it in (details here).  Maybe, I'll write one someday if I see enough of my clients utilizing this framework.  

Saturday, January 9, 2016

How Apache Mesos Adds Business Value

The Apache Mesos product adds business value.  An incredible amount of it.  But the source of that value isn't what most people think.  Most think that the business value is simplification of deployment for developers so that they spend more time developing features instead of worrying about the deployment infrastructure.  There is value for developers, but there's more.

Mesos eliminates cloud vendor lock in.  Mesos slaves run on virtual machines with prerequisite software on it.  Those slaves can exist just as easily on Amazon AWS or Google Cloud, Microsoft Azure, or any other cloud vendor.  All cloud vendors provide VPN capability so that hybrid cloud (a mixture of cloud and on premise) infrastructures are possible; everyone wants that.  You can't exist in today's marketplace without it.  There's nothing getting in the way of Mesos masters running slaves in on premise our cloud infrastructures in any combination.  Somebody makes a financial decision add to where slaves (or Mesos masters, for that matter) are hosted.  


Mesos reduces switching costs between cloud vendors.  One of the cloud "fears" I hear from clients is vendor lock-in.  That is, they start with a cloud vendor, that vendor changes its pricing structure or the quality isn't what they need, then they have a large project to switch cloud vendors.  Mesos makes switching cloud vendors or using multiple cloud vendors much cheaper and easier.  Adopting a new cloud vendor consists of establishing a VPN and creating Mesos slaves in your newly consumed cloud. Through the VPN, those newly created slaves can communicate with their Mesos masters wherever they are hosted. From a business perspective, this feature eliminates business risk. Your application runtime environment is effectively decoupled from your cloud vendor.  

Mesos vendor agnostic platform introduces a least common denominator problem.  One of the advantages of using the cloud is to capitalize on cloud vendor R&D, which is much larger and often much more focused than what enterprises can do individually.  Cloud R&D is realized by features such as dynamic scaling, which automatically scales your deployment to match the load it's currently servicing.  The web is repleat with postings about difficulties in making Mesos dynamically scalable and utilizing other cloud features.  Cloud vendor features will always be ahead of Mesos' ability to utilize them and will always have features that those adopting Mesos won't easily be able to use.  

It should be noted that Mesos is popular enough to get the attention of cloud vendors. For instance, AWS is devoting time and energy into providing a way to dynamically scale Mesos slaves through the ECS Scheduler Driver.  Azure and Microsoft have an effort going to bring Mesos (and Docker) to the windows world (here).  A quick google search reveals that there are many other Mesos integration efforts with cloud vendors and there's every reason to expect this to continue.

Saturday, January 2, 2016

Testing Private Fields and Methods in Unit Tests

Covering private code in unit tests can be problematic as they aren't directly executable from unit tests.  I've seen developers take one of two approaches.  One approach is to embed tests for private code in unit tests for protected or public methods.  Another is to escalate the declaration in the code being tested from private to protected, so that they can be more easily handled in unit tests.  Both approaches are problematic.

Testing private methods indirectly through protected or public methods is impractical.  If the private method has conditional logic or manipulates inaccessible fields, it can be hard to test each condition as it requires manipulating inputs to a protected or public method to do so.  In other words, testing of private methods becomes a "bank shot" more or less.  This takes additional time that developers don't always have.  It also makes test code more complex and more difficult to maintain.

Escalating methods and fields to protected status for testing creates a leakage of concerns issue.  Objects work like spies - on a "need to know basis".  If a method is going to be something other than private, then that escalated status should be needed in production code somehow.  If not, then the function of that private method wasn't really intended to be exposed.  The class might not be designed for that method to be overridden in an extension or called from outside the class in some other context.  Escalating that private method to protected in order to test it looses the documentation for other developers that it's dedicated to a specific purpose and really shouldn't be called from outside the class that defines it.

Apache Commons Lang and Reflection to the rescue.  Through reflection, access to public and private methods is more than possible.  Commons Lang makes that job easier.  In other words, you can have unit tests for private methods and check the value of private fields without escalating those items to protected status or attempting to test them indirectly.

Reading and Writing Private Fields

Interrogating and manipulating the values of private fields (without get and set methods) is fairly easy through the Commons Lang FieldUtils class.  Examples of reading and writing the values of private methods are provided in examples 1 and 2 respectively.  As you can see, FieldUtils gets this down to a one-liner.

Example 1:  Reading the value of a private field

Integer privateInt = (Integer) FieldUtils.readField(myClassInstance, "privateInt", true);
assertEquals(TEST_VALUE, privateInt);

Example 2:  Changing the value of a private field

FieldUtils.writeField(myClassInstance, "privateInt", Integer.valueOf(5), true);
// Run your tests and check for results

Executing Private Methods

Executing private methods *should* be just as easy from Commons Lang from the MethodUtils class.  In fact, I've proposed a minor enhancement to the Commons team to do just that. Until such a time, private methods can still be executed/tested directly. However, it's three lines of code and not just one.  I've got two illustrations of invoking private methods.  The first, example 3a, show executing a private method with no arguments.  The second, example 3b, is slightly more complex and assumes two arguments: a primitive int and a string.  Note that you do need to change the accessibility of the method so that it can be executed before you invoke it.

As I said, I hope that Commons Lang provides a one-line alternative for this at some point in the future.

Example 3a: Executing a private method (no arguments).

Method myPrivateMethod = MyClass.class.getDeclaredMethod("myPrivateMethod"); 
myPrivateMethod.setAccessible(true);
Object myResult = myPrivateMethod.invoke(myClassInstance);

Example 3b: Executing a private method (two arguments).

Method myPrivateMethod = MyClass.class.getDeclaredMethod("myPrivateMethod", Integer.TYPE, String.class); 
myPrivateMethod.setAccessible(true);
Object myResult = myPrivateMethod.invoke(myClassInstance, 5, "testValue");




Saturday, September 5, 2015

Management Benefits from using Microservice Architectures

Much has been written on technical costs and benefits of using microservice architectures.  Also, we’re starting to see more management topics, such as prerequisites for implementing microservice architectures.  For an excellent summary of needed prerequisites to implementing microservices, see Fowler’s “Microservice Prerequisites”.  Among the prerequisites for succeeding with microservices is a dev-ops culture where environment setups and deployments are automated.  You also need skills in establishing, defining, and documenting web service operations and contracts; specifying the inputs/outputs of each service.  I believe that there is more to the list of what’s been written about prerequisites, but that will be the subject of another blog post.  Not enough has been written about the management and business benefits to utilizing microservice architectures.  I’d like to change that.

You might question why a description of management/business benefits to using microservices architectures would appear on a blog targeted at java architects.  The reason is that it often falls to the architect's to 'sell' the idea to management.  That makes the topic relevant to architects and worth including on this blog.

Resource Staffing Flexibility and Insulation

Microservice development teams are small and focused.   Developers *only* work on one service at a time.  That service is supposed to have a firm contract those developers are coding to preferably with containerized deliverables (i.e. Docker) that.  The point is that they’re mainly heads-down development.   They shouldn’t be burdened (or have to be burdened) with communication overhead beyond with other developers working on that service.

They are working as an independent team, completely asynchronously from all other service teams.  Combined with the fact that they have low communication overhead and defined deliverables, they can be anyplace in the world in any time zone.  It doesn’t matter whether they are in-sourced in some corporate office somewhere or off-shore.  



Given that the work of the team is confined to the focused requirements for the service, worries about code quality, etc. are at least contained to the small service source code base.  In other words, there’s a limit to how much code that team can mess up and create maintenance issues.  
    
This means that for development for microservices services can be more easily outsourced.

This also means that you’re better insulated from rogue developers.  We’ve all managed developers who produce unmaintainable code.   As rogue developers are confined to one small and focused code base, there’s a limit to how much damage they can do to that code base.   Furthermore, that code base is small, focused, and well-defined; it’s more replaceable if need be.  You’re not letting rogue developers loose in a 500K line monolithic application to cause havoc on a wider scale.

Increased Development Velocity

Assuming that contracts for needed services are fully defined along with the dependencies between those services, it’s easier to bring larger number of development teams on for ongoing efforts.  The concern about bringing on additional developers is that that typically means more communication overhead.  As a result of that increased overhead, developer productivity has diminishing returns.  You get less and less productivity out of each additional developer.  

While individual developer productivity does increase in microservice teams as they don’t have as much communication overhead, there is a limit to how much productivity increase you’ll get with any individual developer.  Your velocity increase really comes from the ease with which new teams can be added.  Developers never see or are impacted by the size of the overall effort as they are heads-down development in a small, focused, team.   Consequently, it’s easier for management to scale development to larger numbers of developers.  Note that in addition to having all service contracts fully defined, you also have to have identified any dependencies between contracts; some services might require other services be completed first before work on a particular service can finish.  Note that any development team can stub dependent services and start work before dependent services are finished.

What really happens here is that the constraint to scaling development shifts to management and enterprise architects responsible for identifying individual services and specifying the service contracts.  With larger efforts, will come more coordination points.  As some services call other services, those dependent services need to be built first.  The order in which work flows down to development teams needs to be coordinated.



Technical Stack Freedom

In a microservices world, services are containerized.  That is, the deployment artifact has an operating system and other software that it needs (e.g. Java JVM, PHP, Python, GoLang, etc.).  Each service can easily operate at different levels of these technologies.  The corollary is that new technologies can be more easily introduced or upgraded.   That makes it easier for you to capitalize on newer technologies much more quickly than you can with traditional monolithic web applications.
With traditional web applications, the code base is much larger.  Therefore, the impact of upgrades will be much larger and require more extensive regression testing.  Moreover, switching development languages is much more difficult with the larger code base.

This is a technical benefit, but I believe it to be a management benefit as well.  Managers are always looking for new ways to deliver software faster.  One of the avenues for increased developer productivity is technology advances with the available tool sets.  Your ability to utilize newly developed tool sets and bring about productivity increases the business wants is better with microservice architectures than with traditional web deployments.

A Word about Microservice Prerequisites.

This blog post doesn’t address the prerequisites you need to have in place before you can succeed with microservice architectures on a large scale.  That list of prerequisites is a long litany.  However, there are benefits available to offset the costs.

Thursday, August 6, 2015

Exception Handling Issues for SOAP Faults with CXF Clients

The Apache CXF product is a wonderful product and for the most part it is easy to use.  It's very common to use CXF to generate clients for SOAP services.  Support for those clients for SOAP services that specify faults (the SOAP version of an exception) is not easy.

For example, when CXF generates exceptions for SOAP faults, which it does, it embeds details of the fault in a field on the exception that isn't reported in either the message or the stack trace.  This information is critical for support developers diagnosing problems and issues.  For example, I generated CXF clients on  a public SOAP service that utilizes faults.  It generated the following exception:

Generated Exception excerpt:
public class BatchException_Exception extends Exception { 
    private com.postini.pstn.soapapi.v2.automatedbatch.BatchException batchException;

public com.postini.pstn.soapapi.v2.automatedbatch.BatchException getFaultInfo() {
        return this.batchException;
    }

}

The meat of the exception - the information you need to actually solve the issue - is in that BatchException class that contains a useful message.  Unfortunately, the generated CXF exception doesn't provide this information in either getMessage() or printSTackTrace().  Here's what the trace looks like; it doesn't contain the embedded CXF diagnostic information:

com.postini.pstn.soapapi.v2.automatedbatch.BatchException_Exception: exception message at org.force66.cxfutils.CxfSoapFaultRuntimeExceptionTest.testBasic(CxfSoapFaultRuntimeExceptionTest.java:28) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


If you want this information, you have to programatically dig it out.  This isn't hard, but a typical SOAP service will have numerous exceptions. There's no consistent interface implemented, so it's hard to write generic code that will provide the useful fault information when errors occur.  It's a lot of custom code to handle each type of fault.  What a pain.  I've figured out a way to solve this problem.

Fortunately, CXF does allow you to specify the base exception it uses.  java.lang.Exception is the default.  I've created an exception that is meant to be used as a root exception by CXF.  It uses reflection to dig the embedded information out of the CXF exception and make sure it's in the exception message and printStackTrace.  In addition, I leverage the ContextedRuntimeException from Apache Commons Lang as that exception makes it easy to attach useful information to exceptions.  Here's a sample of the same exception output with the new base exception.  Note that the embedded CXF diagnostic information is present.

com.postini.pstn.soapapi.v2.automatedbatch.BatchException_Exception: exception message
Exception Context:
[1:cxfclient.message=embedded cxf info]
---------------------------------
at org.force66.cxfutils.CxfSoapFaultRuntimeExceptionTest.testBasic(CxfSoapFaultRuntimeExceptionTest.java:28)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

The exception that provides this functionality extends ContextedRuntimeException.  It then populates the exception context with the information CXF embeds so that it will appear in your logs without any manual programming effort.  

Information on how to specify the root exception CXF uses can be found here.

This code has been open sourced and is available on github here.  It also has been deployed to Maven and is easily includable in your projects.

/**
 * Exception meant to be extended by Apache CXF when generating exceptions for
 * SOAP Faults. This exception will insure that embedded information CSF places
 * for exceptions will be logged.
 *
 * @author D. Ashmore
 *
 */
abstract public class CxfSoapFaultRuntimeException extends
ContextedRuntimeException {

// constructors omitted for brevity

protected void checkExceptionContextInfo() {
if (!contextAdded) {
Object embeddedInfo = ReflectUtils.safeInvokeExactMethod(this,
"getFaultInfo");
ReflectUtils.reflectionAppendContextValues(this, embeddedInfo,
"cxfclient");
contextAdded = true;
}
}

@Override
public String getMessage() {
checkExceptionContextInfo();
return super.getMessage();
}

@Override
public String getRawMessage() {
checkExceptionContextInfo();
return super.getRawMessage();
}

}


Saturday, July 11, 2015

Writing Microservices in Java

I've been selected as a presenter at JavaOne in San Francisco (Oct 25-29) this year.  My session description is below.  I hope to see you there!