Tuesday, January 31, 2012

How to Reduce External Dependencies for your Java Libraries

For people who write or contribute to java open source products, external dependencies are a blessing and a curse. They are a blessing in that these external dependencies provide needed functionality that shortens development. I couldn't imagine writing code without the benefit of Apache Commons Lang, Commons Collections, Commons IO, Commons BeanUtils, and many more. They shorten development a tremendous amount, but for open source libraries, they also present problems.

The first problem is that external dependencies can cause class conflicts.  for example, it's possible that the library you release works perfectly fine under Commons Lang 2.6, but doesn't run properly with Commons Lang 2.1. Yes, you can run your unit tests using previous versions of your dependent products; but there's no guarantee that this will catch everything. Furthermore, it takes time and effort which is often better spent adding new features to enhancements to your library. 

The second problem is that new releases of your external dependencies can cause runtime problems in the future.  There's no way you can test against un-released versions of these products. Just because you work fine with Commons Lang 3.1 doesn't mean that you will run properly with upcoming releases. This is also a problem for the users of your library. Typically, web applications have a vast assortment of libraries they depend on, each with their own dependency list. It's possible for these dependency lists to conflict. Yes, there are tools to help you identify these conflicts. Yes, we try to choose dependencies wisely and choose products with a good history of maintaining backward compatibility. But, these aren't going to completely keep users out of trouble.

With an open source product I'm involved with, Admin4J, we took a different approach. Yes, we leverage other products, but we do so differently. We repackage the most of the products we use. That is, we slightly refactor their underlying source to have a unique package structure. For example, Apache Commons Lang's main package is org.apache.commons.lang3. We refactor that so that the package Admin4J relies upon is net.admin4j.deps.commons.lang3. We make no other changes.
The advantages of this approach are the following:
  • We benefit from the functionality provided by these other products.
  • We have a more consistent runtime environment; we don't need to worry about dependency version differences with the versions we develop and test with.
  • Our users don't have to be concerned that our dependency list conflicts with the list of one of their other dependent products.
The disadvantages are the following:
  • We consume additional memory (PermGen space) for additional copies of classes that might already be in the users classpath.
  • Some products don't work well with this strategy; this strategy didn't work well and isn't used by us with Freemarker and Slf4J. We still list these two products as external dependencies.
To give credit where credit is due, we borrowed this technique from Tomcat, which uses it quite successfully. Tomcat uses this technique for to utilize Apache Commons Logging and Commons DBCP. The secret sauce to accomplish this refactoring is the replace Ant task. We use Ant to perform the package refactoring, compile the resulting code, and package it either for development or as part of our deployed runtime jar. An excerpt from our build script to illustrate follows:

<!-- Perform package refactoring -->
<replace dir="${temp.src.dir}/net/admin4j/deps/commons" >
     <replacefilter token="org.apache.commons.lang3"
           value="net.admin4j.deps.commons.lang3" />
     <replacefilter token="org.apache.commons.mail"
           value="net.admin4j.deps.commons.mail" />
     <replacefilter token="org.apache.commons.fileupload"
           value="net.admin4j.deps.commons.fileupload" />
     <replacefilter token="org.apache.commons.io"
           value="net.admin4j.deps.commons.io" />
     <replacefilter token="org.apache.commons.dbutils"
           value="net.admin4j.deps.commons.dbutils" />
     <replacefilter token="org.apache.commons.beanutils"
           value="net.admin4j.deps.commons.beanutils" />
     <replacefilter token="org.apache.commons.collections"
           value="net.admin4j.deps.commons.collections" />
     <replacefilter token="org.apache.commons.logging"
           value="net.admin4j.deps.commons.logging" />
</replace>

For those of you who write or contribute to open source libraries, I'm interested in any other strategies you might have encountered and how they worked out.

No comments:

Post a Comment