There are many different types of application testing. This article is entirely about system-level testing, the most outer-level user experience testing. System-level testing is also the most difficult to automate.
I'm talking about integration or testing of the application from an end-user perspective only. Many use the term system-level testing for this activity. Other types of testing such as unit, performance, exploratory, and usability, are not a part of this article. Other forms of testing are essential, but not the focus here.
System-level automated testing has too much friction. It can't keep up. It's the most challenging type of testing to automate. Because of that, many still perform system-level testing manually. The cost-benefit of automating these types of tests is elusive. This test automation certainly can't support high-performing DevOps teams' high-frequency change rates.
System-level automated tests are fragile. The slightest change or refactoring at the outer web layer breaks a large percentage of system-level tests. Automated testing at this level usually relies on labels programmers use for parameters and control identifiers. Programmers usually consider them free to refactor these labels for clarity without notice.
The lack of automated system-level testing impedes the firm's ability to implement continuous delivery. In turn, manual system-level testing lowers the lead-time, which is one of the DORA metrics, we seem to be using these days.
What's the Alternative?
Let's outsource system-level testing to end users. Rather, let's enlist a small percentage of end users to use a release candidate in production and measure their error rate. Additionally, those errors can be provided to the development team for remediation.
Instead of writing system-level tests, implement canary deployments and provide a release candidate version that is considered production and uses production databases and resources. The release candidate is production in every way, except that it's used by a small percentage of users. If the application is hosted in the cloud, it's possible to create a "sister" installation of an application in production that uses production resources in the same way the active version of the application does.
Remediate the release candidate until the error profile is acceptable for mainstream release. In other words, fail forward, don't roll back when errors are discovered. At some point, the release candidate will be considered stable and is made active for 100% of users. At this point, a new release candidate is created for new features and changes to be tested in the same way.
This solution avoids the problem of automating system-level tests and all its problems in terms of friction and fragility. Sometimes, the winning move is not to play! What I propose doesn't skip testing. It just changes the paradigm under which that testing is conducted.
The testing that end users do is going to be more comprehensive than any test plan can provide. Moreover, testing by end users will concentrate on the most frequently performed tasks.
There are diminishing returns to increasing the number of users directed to the release candidate. That is, you will discover more defects increasing the number of users on the release candidate from 0% to 2% than you will from 25% to 50%.
If you monitor error rates on the application, automation can be built to support continuous delivery. In other words, if the release candidate reveals no increase in error rates over the current live version, automation can make the switch based on thresholds you configure.
This concept sounds scary, but is it functionally different from what we experience today? We all see defects deployed to production despite our best testing efforts. I'm just suggesting we use what we experience rather than pretend we can avoid it.
Thanks for taking the time to read this article. I'm always eager for feedback.