3 DevOps Techniques for Stress-free Release Management

Your release management team is sold on DevOps. Now how do you make it work? Thanks to an explosion of DevOps tools and techniques, releasing new features no longer needs to be a stressful, all-weekend slumber party.

What follows are a few release management techniques that you can use to take the stress out of releases and get a good night’s sleep. Use these techniques, and you will regain confidence in your release process.

What is your process like? Signs that things could be better include:

Everyone involved is stressed out.
Releases happen after hours, often on weekends, or in the early morning hours or late-nights.
You have several error-prone manual steps that everyone needs to follow.
Your team relies on a “deployment hero“, one person who actually knows how to get the code deployed.
You launch every release with a silent prayer — you hope it works, but you’re not really sure it will.
Users are often negatively affected by new releases.
Releases are infrequent, occurring only a few times a year.

Do any of these scenarios sound familiar? They’re all symptoms of a broken release management process and, unfortunately, they’re all too familiar for most software developers. We approach releases in this way because we don’t have confidence in our changes or in our release processes. To gain that confidence, you can set up a deployment pipeline and improve it over time to ensure that the changes you’re making will work as expected. In this article, I’ll focus on techniques you can use to gain confidence in your release process.

The Forrester Wave: Enterprise Service Management 2018 – GET REPORT

Use infrastructure as code

The time for manually crafting servers is over, for two reasons. First, any manual process is error-prone. Humans make mistakes and forget things. How many times have you’ve heard someone say, “Oh, I forgot to change a setting,” after you spent hours of debugging a production incident? Second, manual steps are hard to track. People try to document the steps they take, but that documentation quickly gets out of date. And there are always small steps that don’t get documented at all, especially for those tweaks you made when something went wrong.

If you don’t know how your server came to be in its current state, how are you going to get it back to that state when something goes wrong? Disaster recovery is a weakness for many organizations. How long would it take you to get your server back up and running if it crashed right now? Manual deployments are slow and difficult to replicate.

Infrastructure as a code (IAC) is the practice of describing your servers in source files that you check into version control and applying them automatically. Config Management tools let you declare, in code, what your server should look like. The tool then automatically applies the changes to your servers.

IAC offers three key advantages:

It leads to more reliable releases. When you automate the process of installing and configuring software, you are reducing the room for error.
It allows for a more repeatable release process. You can run your scripts over and over, in your testing environments and in production, and achieve the same result. Releasing becomes a well-tested and well-rehearsed process, and disaster recovery becomes simpler because the requirements of a replacement server are clearly documented in the code. What’s great about code as a document is that you can run it and get a server that behaves exactly as specified.
Living documentation improves audits. Not only do your scripts tell you exactly how your server is configured, but the fact that they are version-controlled means that you can also tell how they have changed over time. In addition, most configuration management tools come with built-in logging, so you can also see when scripts were applied to a server. If you use a continuous integration tool to kick off your deployments, you’ll also be able to see who initiated the deployment.

Destroy all your servers, all the time

What happens when your server crashes and you lose everything? Do you have confidence that you’ll be able to recover in a reasonable amount of time? If your servers are configured manually, the answer is most likely no. You might be thinking that you regularly do system backups and that you could just restore those. But how often do you test this technique? Are you sure it will work?

There is a common problem in server administration known as configuration drift. This is when the configuration of a server evolves over time, due to the compound effect of making changes on top of existing changes. This is a big concern with manual configuration, but it even happens when you’re using an automated configuration management tool. How confident are you that a single run of your current scripts will put a new server in the correct state? Does your server work by happy coincidence? Over time, configuration drifts results in snowflake servers: servers become so unique in a configuration that they are impossible to reproduce exactly.

The only remedy for configuration drift is to destroy your servers on purpose. Make sure your servers are virtual machines, so you can easily destroy and recreate them. Ideally, you should do this every time you deploy. These are commonly referred to as phoenix servers. This technique will force you to ensure that your infrastructure code actually contains an accurate description of how your server needs to be configured. You’ll be amazed at how confident you will become in your release process. Suddenly, disaster recovery stops being a big deal, because you rebuild servers all the time. You’ll have a well-rehearsed process to follow when disaster strikes.

Zero in on zero-downtime deployments

Why should your users care that you’re deploying? “Scheduled Downtime” messages are embarrassing. It’s ridiculous to gather your development team at 1:00 am to do a crucial deployment, when they’re at their most tired, just because that’s when your usage volumes are lowest.

You need to find ways of releasing software without affecting your users. This is particularly important when you’re using phoenix servers because otherwise, your downtime can be unacceptably long. One technique I’ve found to be effective is Blue-Green Deployment. This technique involves having two sets of identical production servers. Set up some routing rules using a load balancer such as Nginx or HAProxy, so that your user traffic all goes to one of the server sets. Let’s call this the active environment. This means that you have another entire production environment that users can’t see or use. Let’s call this the passive environment.

Having this extra set allows you to deploy your changes to servers you’re not using. Destroy and rebuild them with your automated configuration management tool of choice. No one will be affected if it goes wrong, which means you can do it during normal hours and without any of the stress associated with deployments that require downtime. When you’re confident that your passive environment is working as required, you can (automatically) change the configuration of your load balancer to redirect user traffic to your new set of servers. These servers become your active environment, which will serve the latest version to your users.

These techniques will help make your release management process more reliable and protect your users from any mistakes you might make. Applications like VegaStack have configuration management tools internally to rebuild your servers regularly and implement zero-downtime deployments. Stop deploying late at night. Stop stressing about whether your release will work. Finally, focus on improving your release process so that you can get a good night’s sleep.