Wednesday, May 28, 2014

Handling External Processes in Java made Easy

Handling external processes in Java has never been easy.  The JDK provides a way to start and manage external processes and they do work; but they are awkward.  If the process hangs for some reason, getting at the input and output streams to figure out what’s going wrong is a pain.

I ran into this issue recently with my work on the Transform4J project, am open source Java ETL Transformation API,  where my test cases need to start and stop databases such as Cassandra and MongoDB.  Getting these test cases to work as far as starting up and shutting down these databases was very irritating and took more time than I would like to admit.  This prompted me to look for an open source product that makes process management easier.  I found one.

The Apache Commons Exec product makes external process management from Java very, very easy.  To illustrate with a simple example, let’s start up a Cassandra database for a series of unit tests.  

Executor executor = new DefaultExecutor();
executor.execute(new CommandLine(cassandraStartupCommand));

In my case, I was having trouble getting Cassandra to start up properly at one point, so I added a StreamHandler to capture any output so I could more easily debug the issue.  By default, this output goes to standard out and standard error (this is configurable).  I just added one line before execution:

executor.setStreamHandler(new PumpStreamHandler());

As it happens, I need to shut the database down after unit tests have completed.  It turns out that this is easy to do.  Cassandra initiates a graceful shutdown when the process is terminated.  To accomplish this with Commons Exec, you need an ExecuteWatchdog.  Adding one to the execution  is relatively simple:

ExecuteWatchdog  watchdog = new ExecuteWatchdog(5000);
executor.setWatchdog(watchdog);

When the unit tests are complete, terminating the process is simple:
watchdog.destroyProcess();

Note that there are additional features in this API that look useful (but that I didn’t happen to need for my unit tests).  With an ExecuteResultHandler, you can optionally throw an exception should the sub-process fail; I may incorporate this into my testing process.  You can very easily associate a ProcessDestroyer with the execution to terminate the executing process via shutdown hook when your JVM terminates.

I wish I had found this product sooner; it’s been around for five years or so.  Those looking for a more in depth introduction should check out the tutorial here.

Thursday, March 20, 2014

A JSON Library Evaluation for Java EE applications

I have a need to enhance a set of Java EE applications to support mobile development projects.   As mobile developers seem to prefer JSON formats to XML for passing data to/from mobile devices, I had a need evaluate current Java JSON libraries for use by the supporting Java EE applications.  The client prefers open source products to vended products.  The most prevalent open source product choices seem to be the following:

This evaluation was conducted in March, 2014.  The evaluation was conducted using the methodology described in chapter 13 of the Java EE Architect's Handbook.

Evaluation Criteria

As with all product evaluations, it’s important to establish the criteria by which product choices will be graded and on which a product decision will be made.  The criteria used in such a product choice would obviously vary per project and organization.  I’ve settled on the following criteria:

  • Level of community activity - measured by the average number of releases per year.  Active projects are more likely to be enhanced in future.  Should the product not be maintained and become obsolete, it may need to be replaced in future and may incur additional costs of ownership.
  • Market share – measured by the number of downloads as a proxy.  I would prefer direct market share information instead of a proxy, but that isn’t usually available for open source products. Solutions for problems and issues with the product are more likely to be posted on the web for more popular products; such increased posting speeds development and indirectly lowers cost of ownership.  
  • License – The license for the product must conform to a specified list of legal requirements; most common open source licenses are acceptable.
  • Ease of Use – subjective measurement based on code samples and the quality of documentation.   Ease of use features greatly speed development and indirectly lowers  the cost of ownership.  
  • Performance – measured by speed and required memory footprint.  Lower footprints lower hardware requirements and lowers cost of ownership.   Same test case used for all products so that performance can be more easily compared.
  • Versioning Support – I expect that as features are added, existing JSON formats will be changed and enhanced.  With a mobile development with the application distributed to the public, forcing upgrades to accommodate JSON format changes isn’t feasible.  A JSON library that provides support for legacy formats is preferred.  it’s possible that a versioning solution will involve additional products and not solely reside within the JSON library itself.
Not all criteria are necessarily weighted equally; weights given to each item will vary per project and enterprise.  It’s also possible that individual projects might have additional criteria not listed here.

Evaluation Results and Product Ranking

The evaluation criteria were prioritized by the needs of my upcoming project.  It's possible that your priorities might be different.  All criteria were assigned a numerical score from one to ten with ten being the best.

Scores for the criteria were weighted by priority.  High priority scores were multiplied by 2.  Medium priority scores were multiplied by 1.5.  Low priority scores were left alone.  Weighted scores for all criteria were then added together for the overall ranking.  Reasons for the ratings given are discussed in more detail in the product observations section below.


Product Observations


Ease of Use

I view this as the most important of the criteria as it most directly lower costs of ownership.  With both Gson and Jackson creating and reading JSON formatted data were two liners.  I’ve provided the source for my prototypes in the reference section below.  With my prototype, there was no need to write custom serialization/deserialization logic; both products do have robust options to do this should it be needed for your projects.  Both products by default handled escaping special characters like quotes and carriage returns.  The prototype source code and all dependencies can be downloaded from here.

One reason Gson slightly edges out Jackson is because it has better documentation.  The Gson documentation is better organized and more concise.  As a result of that documentation, constructing the Gson prototype took a little less time than writing the Jackson prototype.
It’s also notable that Jackson has a clumsy distribution: They need a one-zip download that contains binary jars, source jars, Javadoc jars, license and documentation.  Instead, all of these are downloaded separately and the product is structured so that it’s not obvious which jars you need for your particular project.

Note that coding using the classes provided with the JSR 353 spec takes much longer and it’s much easier to create bugs.  My prototype (I re-coded the same product with each of the products) required over 100 lines of code using the JSR 353 construct as opposed to the handful required by either Gson or Jackson.

I did not code a prototype for the Json.org product as it’s not a real product.  There’s no formal distribution bundle.  For distribution, you download class source individually and build it.  There are no unit tests, so you can’t validate your build very easily.  Essentially, product source becomes additional source in your project that you must maintain.  It became clear that this product isn’t a valid option, despite the fact it appeared near the top of internet searches for JSon libraries.

Level of Community Activity

Community activity is important for open source projects as products will become obsolete without care and feeding.  Being part of the Java EE specification means that the JSR353 specification have or will have support from all the Java EE implementers, both commercial and open source.  It’s hard to compete with that.

It should be noted that both Gson and Jackson typically have multiple releases per year.  Both products are used in several open source products.  Both Gson and Jackson have ample community support and will no doubt be enhanced for some time to come.

Market Share

The number of downloads could only available for one of the four products:  Gson had 143,904 downloads as of March 14, 2014.  Somebody built the binaries for the java classes on Json.org; that unofficial distribution had 10,821 downloads on March 14, 2014.  Download statistics for Jackson isn’t published as far as I can find.  For JSR353, it’s not possible to identify download statistics; it’s not clear that everybody who downloads a Java EE application server with JSR 353 support in it will use that section of the container for JSON processing; they could use either GSon, Jackson, or some other library.

Performance

The performance test was identical for all products.  The test consisted of starting with a complex value object with data and using that object to produce JSON-formatted data.  That JSON formatted data was also read and marshaled into the value objects.  This is a very common usage scenario.  Source for the performance evaluation can be downloaded from here.

Memory was measured by noting consumed memory (total memory less free memory) both before and after the test.  The test was run for 100,000 iterations to make time and memory usage more noticeable.  Results are in the table below.

The JSR353 reference implementation is at the extreme; it was a lot slower than the other two libraries, but appears to have a much smaller footprint.  It’s possible that the JSR test took so long that there was a garbage collection during the test, which means that the memory measurement is understated.

Comparing Gson and Jackson, Gson was faster and had a smaller memory footprint for reading Json data and marshalling that data into value objects.  Jackson was faster and had a smaller memory footprint producing Json data from value objects.

Versioning Support

Gson versioning support uses annotations.   When coding value objects that will correspond to the JSON data read or produced, you use the @Since annotation t record the version in which the field appeared.  When you instantiate Gson to read or produce JSON data, you have the option to specify the version of JSON that will be used.  Any fields from later versions will be ignored.

None of the other products appear to have versioning support.

License.

All three viable product options have licenses that are acceptable to most organizations.  Gson uses Apache 2.0.  Jackson’s license appears to have changed over time; Jackson was LGPL up to version 2.1 and an Apache license thereafter.  Jackson does not specify clearly which version of the Apache license it uses.  Furthermore, its distribution method doesn’t place a copy of the license in what you download.

Concluding Thoughts

Either Gson or Jackson are reasonable choices.  Both products are very easy to use for common use cases and have options for customized serialization/deserialization logic if needed; for most uses custom logic won’t be needed.

My choice for my upcoming project is Gson.  I’m lured by the superior documentation and versioning support.

I hope you’ve found this entry helpful.  Thanks for taking time to read it.

References


Saturday, October 6, 2012

Are Commercial J2EE Application Servers worth their cost?


If the market is any indication, the answer is a resounding no! Check out market share research conducted earlier this year and published on Silicon Angle. The two largest commercial application servers Websphere and Weblogic have a whopping 2.17% market share between them. The lion share of the market is going to open source application servers such as Tomcat and JBoss.
As an architect and developer, I've always thought open source application servers easier to support. In most organizations, access to software vendor support staff is tightly controlled and requires a bureaucratic effort to utilize. Because of this, developers are often left attempting to resolve issues on their own anyway. 
Self serve resources for commercial application servers such as documentation, web postings for similar problems, and searchable bug lists are often not much better than what you find with the open source alternatives. Furthermore, getting to knowledgeable support staff often takes time I don't usually have. Often, my support calls need to get routed from first-level support to second or third level support.
Most problems that start out being blamed as "application server" issues are usually application code defects. Rarely are outage or production defects resolved at the application server level. This makes sense as application server code is usually better and more thoroughly tested than application code.
As a manager, I've never found the "security" of having vendor resources available particularly comforting. It certainly doesn't assuage clients who are experiencing some type of outage or defect they need help with for very long.
Centralize and standardize use of sophisticated software (whether its commercial or open source) throughout the enterprise. Standardize the use of application server software to the point where all the typical issues are solved once and do not need to be continually revisited. For example for application servers, I standardize all build and deployment scripts and container configurations. With the exception of memory allocation and port assignments, I standardize other feature usage (e.g. management console usage) so that they are the same for all deployed applications.
These choices are not usually revisited by each application developer or team for each application. When the need for changes arise, these changes are centrally evaluated and those configuration standards changed. The change is then deployed on a planned basis throughout the enterprise. This may seem a bit excessive, but I'd rather developers spend time adding needed software features to applications and better supporting the business rather than low level application server configuration concerns. As a result of this standardization, mysterious problems occurring in some environments and not others rarely happen. I've written more about the benefits of this type of standardization here.
Some organizations see liability benefits to commercial software. For instance, it's another firm to possibly to shift blame to should problems and issues arise. Maybe it's where I've worked in the past, but I've never seen blame shifting strategies of this type work over the long term.
Another inference from this market share study is that use of Enterprise Java Beans (EJBs) has largely disappeared. However, that should be a separate discussion.

Thursday, March 15, 2012

Design Tips for Integrating Your Java/J2EE Applications with 3rd Party Software Products


Those of us writing Java/J2EE applications are commonly asked to interface with other applications we don't control. Sometimes, these are other custom applications written and managed by other teams. Sometimes, these are vended applications. Often, these applications are on a different platform (e.g. .Net) and sometimes not even designed to be integrated easily with custom applications. I refer to these types of interfaces as external [application] interfaces. External interfaces like these are usually an unpleasant source of support issues. There are ways Java architects can design external interfaces so that they minimize these support headaches and the resources needed to support them.
The key to a minimizing support for external interfaces is insulating your Java applications from them. That is, limit and contain the number of direct dependencies between your Java applications and 3rd party applications. The insulation strategy I usually use is depicted in the graphic below. As an example 3rd party application, let's use a document management system (DMS). This type of product is frequently purchased (or provided open source) and not custom built. Furthermore, there are several DMS vendors and the possibility that an organization may want to upgrade the DMS product or change DMS vendors is always a possibility.
Figure 1

Establish a generic operational data store for needed external application data. This data store will be the source of external application data for all your custom Java applications. This means that your Java applications do not need to understand internals of the external application. Your Java applications will not be affected if the external application is upgraded or enhanced. Consider a DMS as an example. DMS product upgrades happen no matter which vendor you choose. Using this strategy, your Java applications will not be affected by product upgrades; only the extracts populating the data store might be.
The operational data store must be vendor neutral. That is, your operational data store should not contain vendor-specific tables or fields. You should be able to populate the operational data store from a different product without changing it. You should be able to upgrade the external product without changing this data store. In the case of a DMS, you might have Document and Document_Type tables. However, no fields or tables should be specific to the DMS you are using.
Only populate data needed by your custom applications. The only purpose of the operational data store is to insulate your custom applications from external product changes and upgrades. To copy data not needed by your applications is just making work for no benefit. You can always enhance the data store if new requirements arrive.
Establish a generic Java API to process actions and information updates. The classes and methods in this API must be vendor neutral so that your Java applications are not affected by product upgrades and changes. Of course, the code supporting the API will need to adapt to changes in the underlying external application. Using a DMS as an example, you might have a generic Document interface with methods like “addDocument()” and the like. This API should be product-neutral.
Record all actions and information updates for the interface API. For example, if the DMS product exposes functionality via web services, I'll record the SOAP request and response texts for each API call. Should a defect be reported, support developers will have information they need to contact the vendor right away.
I hope you find this strategy useful. As always, your input is welcome.

Sunday, February 19, 2012

Four Tips for Reducing J2EE Application Costs

Much has been made of J2EE application complexity and what managers perceive as high development and support costs. I don't want to spark a religious war over choosing J2EE vs. .Net or LAMP. But, there are ways that managers can minimize J2EE application development and support costs (or decrease them over time if you have a large J2EE investment currently).
Adopt one J2EE web framework and standardize its use for all J2EE applications throughout the enterprise. Web framework product choices are confusing and many. Product choices include Java Server Faces, Spring MVC, and Struts to name a few. There are many articles and blogs that compare and contrast the different framework products; I don't intend to get into this debate. I merely assert that you can decrease costs by standardizing one one web framework choice across the enterprise no matter which framework you choose.
Web frameworks are complex and typically have a high learning curve. Letting web framework choice vary by application causes the following problems:
  • Development staff must become proficient in multiple web frameworks.
  • Managers incur larger burn-in time when re-assigning developers between applications.
  • Managers have a more difficult time finding developers in the labor pool that already know all web frameworks in use.
In addition, when starting new applications, developers typically rehash the arguments as to which web development framework is best. As a manger, you can save money on new developments by taking the choice of web framework products off the table. Similar points can be made for other aspects of J2EE development.
Adopt one persistence framework and standardize its use for all J2EE applications throughout the enterprise. Like web framework products, Object-Relational Mapping (ORM) products (e.g. Hibernate, IBATIS) are every bit as complex as web frameworks and present the same issues for the same reasons; I won't repeat the points already made. Even if you don't adopt an ORM product and use native JDBC instead, there are companion products (e.g. Apache Commons DBUtils) that when consistently used throughout the enterprise can greatly speed up development.
Adopt a common technical stack and standardize its use for all J2EE applications throughout the enterprise. I go further than standardizing the web framework and ORM product choices; standardize the entire technical stack and manage it via source code control. This allows economies of scale for supporting processes such as build management and deployment management. One build script and deployment script can be used for all applications. Improvements in the technical stack can be more easily leveraged across the enterprise.
If all applications have a common technical stack, common code can be developed that speeds development and support for all J2EE applications. For instance, it's not uncommon to have a base set of classes to manage database transactions (e.g. commits and rollbacks). Common utilities can also be developed or adopted to provide Ajax capabilities or perform common UI tasks such as error handling.
While I recommend implementing a common technical stack, it does need to evolve over time. I version it (e.g. 1.0, 1.1, 1.2, etc.). If you decide to upgrade to the next version of Hibernate or your web framework product, create a new version for that work. Upgrades to the version of the common technical stack used can be decided and scheduled individually for each application.
Adopt a common instrumentation and error reporting protocol and standardize its use for all J2EE applications throughout the enterprise. J2EE application support developers have common concerns, such as obtaining alerts for exceptions and memory issues, obtaining runtime performance metrics or managing log levels at runtime to investigate reported defects. I typically leverage the open source product Admin4J for this purpose. This provides economies of scale for application support staff as the alerts and capabilities available to support developers are identical for all production J2EE applications.
The underlying principle for all these recommendations is that consistency provides more value than minor incremental improvements one product may provide when compared to another.
Manager's be forewarned! Some developers will resist these ideas. All developers have personal preferences with regard to technical product choices. The odds that some developers will not agree with the specific product choices made are high. Developers also may perceive that this standardization limits their freedom. It does; no doubt about it. But it also makes support activities easier and transition to other applications within the same enterprise easier. Developers still have creativity, but it's applied when new business needs arise that aren't provided for in the existing technical stack. 
As always, I'm always interested in your thoughts on the topic.

Tuesday, January 31, 2012

How to Reduce External Dependencies for your Java Libraries

For people who write or contribute to java open source products, external dependencies are a blessing and a curse. They are a blessing in that these external dependencies provide needed functionality that shortens development. I couldn't imagine writing code without the benefit of Apache Commons Lang, Commons Collections, Commons IO, Commons BeanUtils, and many more. They shorten development a tremendous amount, but for open source libraries, they also present problems.

The first problem is that external dependencies can cause class conflicts.  for example, it's possible that the library you release works perfectly fine under Commons Lang 2.6, but doesn't run properly with Commons Lang 2.1. Yes, you can run your unit tests using previous versions of your dependent products; but there's no guarantee that this will catch everything. Furthermore, it takes time and effort which is often better spent adding new features to enhancements to your library. 

The second problem is that new releases of your external dependencies can cause runtime problems in the future.  There's no way you can test against un-released versions of these products. Just because you work fine with Commons Lang 3.1 doesn't mean that you will run properly with upcoming releases. This is also a problem for the users of your library. Typically, web applications have a vast assortment of libraries they depend on, each with their own dependency list. It's possible for these dependency lists to conflict. Yes, there are tools to help you identify these conflicts. Yes, we try to choose dependencies wisely and choose products with a good history of maintaining backward compatibility. But, these aren't going to completely keep users out of trouble.

With an open source product I'm involved with, Admin4J, we took a different approach. Yes, we leverage other products, but we do so differently. We repackage the most of the products we use. That is, we slightly refactor their underlying source to have a unique package structure. For example, Apache Commons Lang's main package is org.apache.commons.lang3. We refactor that so that the package Admin4J relies upon is net.admin4j.deps.commons.lang3. We make no other changes.
The advantages of this approach are the following:
  • We benefit from the functionality provided by these other products.
  • We have a more consistent runtime environment; we don't need to worry about dependency version differences with the versions we develop and test with.
  • Our users don't have to be concerned that our dependency list conflicts with the list of one of their other dependent products.
The disadvantages are the following:
  • We consume additional memory (PermGen space) for additional copies of classes that might already be in the users classpath.
  • Some products don't work well with this strategy; this strategy didn't work well and isn't used by us with Freemarker and Slf4J. We still list these two products as external dependencies.
To give credit where credit is due, we borrowed this technique from Tomcat, which uses it quite successfully. Tomcat uses this technique for to utilize Apache Commons Logging and Commons DBCP. The secret sauce to accomplish this refactoring is the replace Ant task. We use Ant to perform the package refactoring, compile the resulting code, and package it either for development or as part of our deployed runtime jar. An excerpt from our build script to illustrate follows:

<!-- Perform package refactoring -->
<replace dir="${temp.src.dir}/net/admin4j/deps/commons" >
     <replacefilter token="org.apache.commons.lang3"
           value="net.admin4j.deps.commons.lang3" />
     <replacefilter token="org.apache.commons.mail"
           value="net.admin4j.deps.commons.mail" />
     <replacefilter token="org.apache.commons.fileupload"
           value="net.admin4j.deps.commons.fileupload" />
     <replacefilter token="org.apache.commons.io"
           value="net.admin4j.deps.commons.io" />
     <replacefilter token="org.apache.commons.dbutils"
           value="net.admin4j.deps.commons.dbutils" />
     <replacefilter token="org.apache.commons.beanutils"
           value="net.admin4j.deps.commons.beanutils" />
     <replacefilter token="org.apache.commons.collections"
           value="net.admin4j.deps.commons.collections" />
     <replacefilter token="org.apache.commons.logging"
           value="net.admin4j.deps.commons.logging" />
</replace>

For those of you who write or contribute to open source libraries, I'm interested in any other strategies you might have encountered and how they worked out.

Monday, January 16, 2012

The Benefits of a Standardized Application Architecture

There is much literature on software architecture and design. Most of that literature focuses on coding patterns and best practices. That is, the literature focuses on an applications internal structure and improving quality at a code level, usually with a single application as the intended scope. In fact, most application architectures are created and deployed for a small number of applications. It's time we looked at the larger picture and considered the benefits of deploying an standardized software architecture across multiple custom applications across the enterprise. Over the past several years, I've had an opportunity to do just that, and observed several benefits that are worth documenting.
Let's start by defining terms so we're all on the same page. I define application architecture as the internal structure of an application. That is, an application architecture details software products it uses internally and the programming patterns employed. For example, application architecture details whether the application uses a layered architecture of one of the other architecture patterns, the exception, logging, and transaction management strategies used, etc. This is different from “applications architecture” (with an “s”) that is used in some EA circles to describe what business data is produced and consumed by each application in business terms.
Most organizations will standardize the hardware platform and operating system to be used. Most organizations will also standardize significant vendors involved, such as Microsoft (in the case of .Net) or J2EE container vendor and relational database will be used. Some organizations will provide base services and methods for securing, deploying and monitoring applications. Many organizations won't standardize much more than that leaving the application architecture up to individual teams.
Over the past several years, I've had the opportunity to guide the development of a standard application architecture in the Java/J2EE space that is used for over a dozen applications. The application architecture defines the entire technical stack including the web framework and ORM used. The architecture specifies a package structure and a specific software layering paradigm along with coding conventions. This architecture provides a standard method for instrumenting applications for performance, memory, logging, and exception management purposes. This application provides base build and deployment procedures.
This standardization of application architecture provided a consistency between the applications that allows for several benefits:
All applications can easily consume architectural improvements. For example, we developed a strategies (with several open source products) to measure performance of each application, provide run-time log-level management, provide memory shortage alerts and low-watermark history, and much more. All custom applications in the enterprise were easily configured to consume these features. This lowers the price tag should it be necessary to replace one of the products used in the tech stack; the solution and migration procedures are developed once and merely reused for all applications.
It is much easier to switch developers between applications. In most organizations, J2EE applications are different enough that there is a significant burn-time for new developers or to move developers from one application to another. As these applications are written in very similar ways, having experience with one application equates to having experience with all of them.
Developer time is optimized in several ways. First, a standard technical stack puts limits on the list of products a developer needs to be current in. With most organizations, the combined technical stack for all applications is normally much larger. Second, development speed for new applications is improved as all basic technical decisions have already been made.
The common architecture leads to a significant base of code that is commonly shared between applications. This implements DRY and keeps developers from having to re-invent the wheel.
The benefits are numerous and have definitely resulted in lower support costs. While I can't divulge exact numbers publicly, the number of support personnel needed for this collection of applications is far less than other organization developing J2EE applications that I'm aware of. It's clear to me that the benefits received stem from the consistency between the applications; not from any specific technical advantages from the individual products used.
There are some challenges to this approach. We've become very careful about what code makes it into the 'common' code base that's shared across the applications. Once it's published and used, the impact of change can be wide-spread. As a result, nothing is published to the 'common' code base until more than one application needs it.
We're a bit slower to consume newly released versions of the open source products used. Any new product release caries the potential for breaking something. We're mitigated this risk by effectively “versioning” the technical stack and common code base so that applications can consume new versions of the architecture at different times.
Some developers consider themselves artists. This type of developer won't like the idea of standardizing the application architecture as it limits their creativity. In this world, creativity (or artistry) comes into play when a new type of technical problem or need surfaces that hasn't already been addressed by the standard application architecture. After several years, the rate that new technical problems and needs have surfaced has decreased substantially;
While my primary experience with deploying a standard architecture of this type is in the J2EE space, there's every reason to believe that other development platforms would see similar benefits to standardizing the application architecture and deploying it across multiple applications.
I'm curious about your thoughts and experiences. I'm particularly interested if you've experienced benefits or costs to an approach like this that isn't already listed.