Organizing a DevOps Transformation

Introduction

In 2009, John Allspaw gave a presentation titled “10+ Deploys Per Day”. John explained how Flickr changed their technology, organization, and culture to increase velocity for the benefit of their customers. The term “DevOps” refers to a set of practices that allow Software Engineering and IT Operations to work together. Research has shown that organizations which adopt DevOps practices are decisively outperforming their peers in terms of throughput. High performing DevOps organizations have better employee loyalty, recover from failures 24 times faster, and spend 50% less time remediating security issues. Creating a DevOps culture will require involvement from everyone in an Engineering organization, but anyone can become a champion of the practices which have been proven to improve velocity.

In this three-part series, we’ll cover the technology and processes which are critical to DevOps, and how each role in the organization can play their part.

DevOps Practices

The “State of DevOps” report found that the following practices are highly correlated to software delivery performance:

  • Trunk-based development

  • Deployment automation

  • Automated testing

  • Monitoring

Trunk Based Development

A simple branching strategy ensures that new work is integrated as soon as possible and merge conflicts are avoided. A great resource for explaining Trunk Based Development (TBD) is trunkbaseddevelopment.com, who’s authors explain the process: Trunk Based Development

Source: trunkbaseddevelopment.com

“A source-control branching model, where developers collaborate on code in a single branch called ‘trunk’, resist any pressure to create other long-lived development branches by employing documented techniques. They therefore avoid merge hell, do not break the build, and live happily ever after.”

Contributors should create short-lived “feature branches” which are submitted for peer review as soon as possible; usually within a few days. This ensures that you receive quick feedback and reduce the chance of a merge conflict. Smaller changes are more understandable and less risky to deploy.

The master, or trunk, branch should track Production as closely as possible. Update master immediately before or after deploying to Production.

Deployment Automation

Deployment automation, otherwise known as CI/CD, refers to the tools and practices that allow for the deployment of new versions of software with minimal human intervention. In the State of DevOps 2019, DORA found that 69% of elite performers have automated deployments to Production. Elite performing companies rank the highest on throughput, lead time, incident recovery, and change failure.

Deployment Automation is made up of two components: Continuous Integration and Continuous Deployment. Continuous Integration refers to the process of compiling, building, and testing an application every time new code is merged into a branch. Continuous Deployment refers to the process of deploying an application to an operating environment every time a new version of the application is created.

Continuous Integration

Continuous Integration (CI) is the minimum bar for every software project, even if you’re the only contributor. Every hosted Version Control System (VCS) has support for running a script when new code is merged into a branch. This script should compile and build code (if applicable) and run any unit tests. CI scripts should run from the same pool of build servers to ensure that all contributors’ code is built using the same compiler version and configuration.

Even better is to run CI scripts before new code is merged into a branch. Most Version Control Systems can be configured to run CI scripts whenever new commits are pushed to a branch. The CI script can inform you of any build errors or failing tests before a merge is actually attempted.

Continuous Integration

Continuous Deployment

Many teams use Continuous Deployment (CD) to a development environment even if Production deploys include manual steps and approval gates. The negative impact of a failed deploy in a development environment is fairly low and deployments can easily be rolled back. Automating deployment to a development environment whenever new code is merged has several advantages:

  • Test the deployment process

  • Share your work with other team members

  • Opportunity to run integration and manual tests

Setting up Continuous Deployment to production environments can be much more onerous because a failed deployment can impact your customers. Much more care is required to safely deploy a new version to production in a fully automated fashion. Canary deployments and feature flags have become common practices to reduce the impact and risk of failed deployments.

Continuous Deployment

What about Continuous Delivery?

Not all products are delivered in a Software as a Service (SaaS) model. Instead of Continuous Deployment, you could refer to Continuous Delivery as the process of sharing your work on a regular basis. For example, a nightly build of a desktop application could be made available every 24 hours.

Automated Testing

Automated testing is crucial for any functioning CI/CD system. However, test suites should be fast and consistent so they don’t become a hindrance to getting work done. The basic rules of thumb are:

  • Faster tests should execute earlier in the development process

  • Tests should be nearly 100% reliable

To achieve these goals, tests can be organized into several different categories.

Unit Tests

Unit tests should only need the code itself in order to execute properly. Ideally, unit tests executed any time code is committed locally or even continuously executed within a developer’s IDE (Integrated Development Environment).

Some teams follow Test Driven Development (TDD) and may write unit tests before starting code development. The idea being that once the test passes, the code is functioning properly.

alt_text

Contract Tests

Contract tests confirm the behavior of external dependencies. One way to get extra value out of contract tests is to build them into the application as a “health check” endpoint.

Example of a health check implementation:

_HealthCheck.EnsureInit(() =>
{
    var healthCheckers = new List<IHealthChecker>();
    healthCheckers.Add(_HealthCheckFactory.Sql("Users_DB"));
    healthCheckers.Add(_HealthCheckFactory.Sql("Session_DB"));
    healthCheckers.Add(_HealthCheckFactory.Redis("Cache_DB"));
    healthCheckers.Add(_HealthCheckFactory.HTTP("Stripe_API"));
    return healthCheckers;
});

 
Example of querying Health results:

$ curl https://dev_server/health
{
  "Status": "Pass",
  "Messages": []
}

End To End Tests

Sometimes known as Critical User Journeys, End To End (E2E) Tests attempt to test the system holistically using by simulating a real-world workflow. E2E tests can be executed during a canary deploy, or after a deploy has completed. E2E tests are fairly expensive to run and can be somewhat brittle, but they are invaluable since they simulate a real user experience as closely as possible.

E2E Tests

Source: freecodecamp.org

Monitoring

Monitoring is crucial from an operations perspective but is also a valuable way to gain feedback on how users are interacting with your product. The book Site Reliability Engineering defines the most important monitoring metrics as “the four golden signals”:

  • Latency

  • Traffic

  • Errors

  • Capacity (or Saturation)

Being able to monitor these metrics requires thinking about monitoring and observability during development. For example, applications should count errors and make this available to a monitoring system (or at least log errors to a central location).

Golden Signals

Source: deniseyu.io

Next Up

In the next part of this series, we'll cover processes which are critical to DevOps.

References