Sunday, August 30, 2020

For Managers: DevOps Automation and Unintended Consequences

Most organizations adopting the cloud have adopted DevOps automation to some degree or another.  The primary reason is that continued manual maintenance isn't possible with the same staffing level and increased demand for a faster change rate. Many aren't to the point of achieving 100% automation but are striving for it. By "automation", I refer to Infrastructure as Code (IaC), automated builds and deployments (CI / CD Pipelines), machine image creation, security enforcement functions, etc. Most organizations struggle with the unexpected and unintended effect on the technology silos most have. I've seen similar issues with most of my cloud adoption and DevOps/automation clients for the past few years.

The goals most organizations have for consuming the cloud and adopting DevOps automation practices are several:
  • Increased speed to market for application capabilities
  • Increased productivity for IT staff
  • Increased scalability and performance of applications
  • Cost-effectiveness as footprint can dynamically scale to load
Steel Copy of a Wooden Bridge
All organizations initially view cloud adoption and DevOps automation as just a technology change. Consequently, they adopt automation toolsets and keep all business management processes in place (e. g. request forms, manual approvals, the internal team structure that governs who does what, etc.). Unfortunately, the paradigm shift to cloud infrastructure and full automation doesn't really permit that. The new world is just too different.

Using existing business processes without change will make it difficult to achieve increased speed to market and consistency between environments.

Pre-automation business processes don't fit the cloud or DevOps automation. DevOps automation is commonly introduced with cloud consumption. Typically, the business is looking for ways to provide additional business capabilities faster and more cost-effectively. Consequently, the number of applications and the number of supported infrastructures increases. For many organizations, the business processes in place either can't easily support a larger software footprint. They don't support the increased speed of change demanded by the business.

The structure of automation often doesn't match the existing organizational structure. For example, setting up a cloud landing pad usually involves not only defining cloud networks, but configuring on premises connectivity, defining and enforcing security policies, defining and enforcing cloud service usage, and much more. From a strictly technology/coding perspective, the automation for these items is tightly coupled and a large portion of it usually belongs in the same automated source code project. Most organizations will have broken responsibility for these items into several teams, usually in separate departments, with people who don't usually work closely together. 

As another example, it's typical for application developers to augment their responsibilities to include IaC automation to meet application needs. That is, the management of virtual machines, application subnets, allowed network ingress and egress to an application is managed by application development teams. Pre-automation, these items would have been managed by different application teams.

The implementation of infrastructure and application hosting drastically changes when consuming the cloud. New cloud consumers quickly find out that the new world is different and consequently, existing business processes for allocating infrastructure and hosting applications in the cloud no longer apply. For example, existing business processes don't accommodate cloud vendors 

Patching the Steel Copy
On realizing the problems created by automation described above, most organizations attempt to "patch" their existing organization and supporting business processes. That is, they adopt a series of minor changes to mitigate some of the problems described above. Examples I've seen are:
  • Establish a manual review for security changes by the security team
  • Assume the cloud is "untrusted" and establish cumbersome firewall rules to guard on premises networks
  • Establish silos for networking and security changes
  • Establish tight restrictions for use of cloud options and services
Any manual review will slow down velocity and productivity. The perception is that this increases safety. However, manual reviews also slow everything down. To this extent, manual reviews through the baby out with the bathwater. A major benefit of DevOps and cloud consumption is increased speed to market. That is, both DevOps and cloud consumption should allow companies to make business capabilities available to end-users faster and increase competitive advantage. Manual reviews decrease if not eliminate this business benefit.

Organizational silos and restrictions create process bottlenecks and discourage innovation.  The logic for silos is that it helps companies achieve economies of scale for specialized skillsets. The trouble is that these silos can't keep up with application team demand. Application teams recognize the bottlenecks and adjust their designs to accommodate and streamline silo navigation rather than use the design they would like. In other words, they are discouraged from using new techniques that don't fit how the silos operate. While most companies provide an "exception" process that allows for a review of new tools, techniques, or procedures; exception processes are often cumbersome and time-consuming. In the end, organizational silos and restrictions depress productivity and slow the release of new business capabilities to end-users.

DevOps and cloud capabilities of companies often lag behind their needs. It takes time to get up to speed on cloud capabilities and DevOps practices. Consequently, the following often happens:
  • Initial environment set-ups and application deployments are much slower than expected.
  • Security vulnerabilities discovered at an increasing rate due to staff inexperience
  • The frequency of change for both management and staff is larger and more difficult than expected.
All the difficulties above depress productivity and reduce if not eliminate the benefits of DevOps and cloud consumption.

By now the reader might be second-guessing their decisions to adopt DevOps practices and the cloud. That's not where I'm headed. They are definitely good decisions, but 

Re-Write Management Processes from the Ground Up
By now, it should be obvious that patching existing management oversight and procedures has limitations. In fact, it won't really work for anyone's satisfaction. DevOps and cloud consumption requires a management paradigm shift in many ways. Let's face it. Management oversight methods and procedures that worked for a smaller on premises footprint simply don't work well for DevOps and the cloud. This section will highlight many paradigm shifts managers face and highlight things that need to change.

Acknowledge that DevOps and cloud consumption require a change in the way you think about management and oversight. This is difficult for many to do and is resisted at first. Once the paradigm shift is recognized, it's much easier to objectively evaluate alternative means and methods. You won't achieve the benefits of consuming the cloud otherwise. It expands your footprint with existing management and oversight processes that don't easily scale. 

Automate management oversight for cloud assets. Since everything in the cloud is "software", management oversight policies can be automated so that they no longer require manual oversight. Automated enforcement, once established, is much more consistent and doesn't require labor in the same way. Yes, this automation will require enhancement and maintenance just like any other software, but it increases the productivity of your security and cloud specialists exponentially. This is a body of work that will take planning and implementation effort - this isn't a costless option.  That said, in the long run, this is the most cost-effective option available currently.

Management oversight automation will also allow the company to migrate to continuous deployment and continuous delivery someday. In fact, continuous delivery is not possible without automating approvals and eliminating manual steps.

Don't try to transition to DevOps and the cloud without help. Yes, you retain smart people and they will get make the transition eventually. That said, it will take them a lot longer and you will experience "rookie" mistakes and accrue technical debt along the way. Keep in mind that you need help from a strategy perspective at a management level in addition to ground-level skills. Companies that look at DevOps and cloud consumption as strictly a technology change have trouble from a management perspective that I've outlined above.  

In Conclusion
This article comes from my experiences in the field. I help companies consume cloud technology and adopt DevOps tactics on a daily basis. That said, I'm always interested in hearing about your experiences. I hope that you find this entry useful and hope for many insightful comments. Thanks for your time.

Friday, May 29, 2020

Design Patterns for Cloud Management and DevSecOps

With the cloud (it doesn't matter which cloud vendor), truly all infrastructure and application management is software-based now. Consequently, most organizations manage their cloud footprint through code. Some organizations are further along that path, but most strive to achieve 100% infrastructure as code. Additionally, application infrastructure and releases are also managed as code. 

Having written code to manage cloud infrastructure, application infrastructure, and application build and release pipelines for years now; I frequently experience deja-vu. That is, I feel that I'm solving the same problem over and over again. Sometimes with different technologies or cloud vendors, but really repeating the same patterns over and over again.

It's time we start thinking of infrastructure code and the various forms of CI/CD pipelines in terms of software design patterns. Patterns that are repeatable and don't need to be "re-invented" for every application, every cloud vendor, or every enterprise.

What is a Software Design Pattern?

This concept was invented and published in 1994 in a book entitled Design Patterns: Elements of Reusable Object-Oriented Software. The book was written by four authors usually referred to as the "Gang of Four" (GOF). While the book originally targeted object-oriented software languages, the "pattern" concept was incredibly successful and has gone on to be applied to many other types of technologies. 

Software design patterns usually have the following components:
  • Problem Statement -- a description of the problem being solved
  • An Example -- a real-world example to help explain the reason the pattern exists
  • Applicability Statement -- a description of when this pattern should be considered
  • Structure -- a description of the pattern in clear enough terms that somebody could implement it
  • Consequences -- Listing of the advantages and disadvantages of using the pattern. This section also includes any limitations
The GOF book and many academic papers include some more sections and a more precise and detailed explanation for each component. I prefer a more practical approach.

What are the Design Patterns for Cloud Management and DevSecOps?

I'm currently dividing patterns into these categories:
  • Build Patterns
  • Application Release Patterns
  • Infrastructure Patterns

Build Patterns describe how source code is compiled, packaged, and made available for release. Additionally, many organizations apply automated testing as well as gather quality metrics. Build patterns currently identified are:
  • Packaging --- Includes any needed compilation. The output is something that can be included in a software release.
  • Automated Testing -- Includes any unit and/or integration testing needed to validate packaged software.
  • Metric Analysis -- Includes and static code analysis that analyzes code quality and complexity. 

Application Release Patterns are patterns used to safely deploy packaged software produced by a build pattern. Application release patterns currently identified are: 
  • All at Once (Spray and Pray) -- Pattern to deploy software without concern for an outage
  • Rolling Deployment -- Pattern to deploy software incrementally to minimize user outage time.
  • Blue / Green -- Pattern to utilize cloud technologies to minimize user outage time and provide easy back-out.
  • Canary -- Variant of Blue/Green that incrementally directs users to a new version of software to minimize the impact of deployments with defects.
Infrastructure Patterns are patterns that create or update cloud infrastructure including networking, security policies, on premises connectivity, monitoring, logging, etc.  Infrastructure patterns currently identified are:
  • Infrastructure Maintenance -- Includes network, security, monitoring, logging, infrastructure and much more
  • Image Production -- Create hardened virtual machine images often used by multiple applications or business units.
  • Mutable Infrastructure Maintenance -- Managing configuration updates for virtual machines that can't easily be destroyed and re-created at will.

Next Steps

Over the coming weeks, I'll document the patterns identified in this post. I'm always interested in patterns I might have missed.  Please feel free to contact me with questions, comments, and suggestions. Thanks for reading.