How 3 DevOps Pros Built Their Continuous Delivery Pipelines
As SaltStack senior DevOps engineer Phil Kalm put it, automated deployment is a journey, and no one really arrives.
New projects, trendy tools and growing workloads keep software engineers on their toes when it comes to continuous delivery pipeline set-ups. And that’s not a bad thing — unless engineers start paying more attention to the latest tech than to the purpose of their pipelines.
“CI/CD has been out long enough, you’d think it’d be fairly mature, but it’s constantly evolving,” Kalm said. “People need to take a logical and measured approach to it, and not try to jump on each new, shiny thing. Look at things from an architectural point of view first, and know where you’re trying to go.”
Other engineers, meanwhile, get so comfortable with the tools they’re familiar with that they miss out on opportunities to improve automated deployment, according to StackHawk senior DevOps engineer Zachary Conger.
“It’s worth taking a look outside of the standard tool set you’ve become accustomed to, because there’s been so much development in recent years around the software that’s available,” he said. “Back when I started, Jenkins was pretty much the obvious choice. Now, there are so many tools — both cloud-based and on premises — to choose from.”
But choose wisely, Vulcan Cyber co-founder and CTO Roy Horev warned. Your pipeline may run in the background, but it’s the backbone of your product.
“Yes, the pipeline automates some mundane tasks, but it’s also what pushes new stuff into production,’ he said. “You have to take care, because if things break, you will lose credibility with your teammates, the management or the users of the platform. That’s going to be harder to fix than any technical barriers you might encounter along the way.”
Here, Kalm, Conger and Horev walk us through their continuous delivery pipelines: what they’re made of, how they test, and their advice for teams experimenting with automated deployment.
Key Continuous Delivery Pipeline Insights
- Take some time to map your pipeline. Observe the developers that will use the pipeline, and write out all the functionalities they need and the ways their deployments currently fail.
- Let developers use the same build scripts at their workstations as appear in the pipeline. That way, the deployment process is predictable.
- Consider developers’ user experience. When you add new steps to the pipeline, consider how those impact developers’ workflow.
- Visibility matters. Provide developers with detailed logging so they understand what’s going on behind the scenes.
- Minimize your working parts. Complicated pipelines get messy and hard to maintain. Could what you’re doing with three tools be done with one?
- Think about how your pipeline will scale. From day one, consider what will happen as you grow and your pipeline needs to run parallel.
- Use an MVP to get some quick wins. Before you fully build out your pipeline, demonstrate its value to stakeholders at all levels.
How Is Your Pipeline Structured, and What Did You Use to Build It?
Co-founder and CTO, Vulcan Cyber
We use GitLab CI, which is GitLab’s technology for writing pipelines. It’s pretty convenient because it easily talks with the codebase and with the container repositories natively, and you get the same UI wherever you manage the code, the tickets and the pipelines themselves.The pipeline has eight simple steps. The first one is the preparation step. It sets the groundwork for all the environment variables and makes sure everything is aligned and where it should be. The second step is the build, where we compile the code where needed and make sure everything is ready to be deployed. The third step is the unit testing step. Then we actually push the code into a container registry, and we have a specific container for each specific version.
Then we deploy to a dedicated testing environment where we run a more complete set of tests — regression tests and integration tests. Once that’s finished, there’s a step to prepare the release, which automatically generates the release notes to be viewed by different stakeholders later. Then, we deploy to a staging environment. If that goes well, we go to the final step, which is just cleaning up all the unnecessary files generated during the process.
Senior DevOps engineer, StackHawk
The bones of it are CodeBuild, which is the AWS CI/CD tool. Then we used Terraform on top of that to build out any infrastructure in AWS that other code projects would require. From there, we added something called Terragrunt to create more scripted inputs into Terraform, which makes it a little bit more manageable.
A lot of shops will have all their Terraform code in one place, but we wanted to make sure each project could be self contained. We broke the Terraform code into individual code repositories. So, if any microservice we developed had any other infrastructure requirements — and we run microservices on Kubernetes — that repository would have its own Terraform code that would build out that infrastructure in AWS. But all of that would still go through CodeBuild. So there’s no manual building of any infrastructure or code. It all goes through the CI/CD process.
Another key aspect of the whole process is that we make use of several different AWS accounts. So all of the building happens in special infrastructure AWS accounts, and they have limited access to runtime accounts where the code actually runs in production or in pre-production. That way, we kept the runtime environments as clean and isolated as they could be. So, if there were any security problems that cropped up, there were various firewalls between pre-production, production and the environments where the code was built.
Senior DevOps engineer, SaltStack
On the open-source side, we have thousands of merge requests that need to be validated and approved, and the testing is quite strenuous. We originally had everything on premises, but we needed bigger scalability and burstability, so we’re on Amazon now. We tested multiple clouds, because Salt touches multiple clouds, and, to a certain extent, we still test on premises.
We use Jenkins, as well as anything from pytests to custom-written Python testing scripts. We’re a Python shop, so we try to orbit around that platform as much as we can. We’ve gone through lots of different CI systems. Jenkins is one of them, but we’ve also looked at CircleCI, Drone and GitLab.
I fear I’ve oversimplified things, because I could talk about this forever. We have a site reliability engineer, for example, that’s modernized a whole bunch of stuff, but it’s been a full year’s worth of work and he’s still working on it. It’s constant, and there are so many things involved in it.
Are You Experimenting With Any New Methods or Tools?
Horev: The way we move from the staging environment to the production environment is kind of a new idea: deployment trains.
The main concept is that not every feature that gets to the staging environment automatically gets pushed to production. There’s a “train” that comes out three times a day and deploys to production whatever it sees fit from our pre-production environment.
Pro-production has a similar data set as the production environment. [The pipeline] deploys the code there and runs an extensive set of unit, regression and integration tests. Once it decides that everything in that environment is stable and good to go, it pushes to production. If anything breaks in the process, you’re going to need manual intervention to cherry pick those issues out.
What’s the Best Way to Get Started With CI/CD?
Horev: The first step to getting an MVP of the pipeline was to map all the different things someone might need to do when they deploy to production. For example, when you push to production and you introduce a schema change to your database, you might need to run a small script to make that schema change happen. This doesn’t happen in every code change, but you want to map that out, because when you’re running this stuff, you don’t want to have to think, “OK, maybe I need this, [but] maybe I don’t need this.” You want to build the mechanism anyway. That way, it never gets neglected.
We spent one month observing how the different deployments go to make sure we don’t miss anything. During that time, we also observed how people fail when they do deployments. Then, we took that knowledge and converted it to quality controls or “quality gates” to make sure when everything is automated, those things don’t happen.
Testing: Any Tips You Can Share?
Conger: For integration testing, we would run components of our code or the microservices as part of the build process. We spin up Docker Compose environments of multiple containers, so we can do testing right in the pipeline’s ephemeral build environment.
That Docker Compose process is really useful in the pipeline, but we’ve also extended it so developers can use that process at their own workstations. We’ve tried to make it possible for developers to use the same sorts of build scripts we use in our automation, so when they push their code to the repository, the process will be very predictable because they’ve gone through it manually.
Horev: We use a tool called Testin.io, which allows us to write tests really fast. It courts your browser window, so when you click on different buttons, you can tell it what results to expect. You don’t have to write any code, so people from our product team can do it as well.
Anything About Your Approach to Testing You’ve Had to Fix or Change?
Kalm: One thing we found is we actually didn’t have a good way of creating environments for testing. Our QA team uses Cypress, and at the start we didn’t have a good way of deploying our own software, testing it, tearing it down and making that repeatable. So I created an environment that actually leveraged our own product to do that. Other people may use Terraform to spin up machines in Amazon; we use Salt Cloud to spin up machines.
For example, we have a massive Cypress test that kicks off twice a day. And since SaltStack Enterprise has a built-in scheduler, it schedules [the test] and calls to SaltStack to create those machines in Amazon. [Salt] configures the machines the way we need it to, because it’s a configuration management software. Then, it kicks off Cypress tests against that software, Cypress returns it, and Salt actually cleans itself up and tears down the machine.
“Each time you add a step to the pipelines, bear in mind the user experience of the development team.”
Horev: In the initial phase of building the pipelines, we ran all sorts of tests. This seemed like a good idea, because we wanted to find any problems as soon as we could, but it could take up to two hours to deploy to production.
If you want to do that for every push, the pipelines are going to be extremely long, and extremely long pipelines cause frustrations among the developers and engineers. You want to give them a good user experience. So each time you add a step to the pipelines, bear in mind the user experience of the development team.
Anything About Your Approach to Testing You’d Still Like to Change?
Conger: One of the next big things we want to tackle is creating an artifact repository. Right now, we have a really simple artifactory repository in the form of [AWS] S3 buckets. We knew all along that eventually we would need to revisit that, but what we had was good enough to get to general availability. A [better artifact repository] would make lots of other testing scenarios easier than they are today, period.
What’s a Common Mistake You See With CI/CD?
Horev: I was under the assumption that if everything is automated, no one really cares what’s going on behind the scenes. And that was a wrong assumption to make with engineers. I found out that visibility is a very important part of the pipeline. You want the pipeline to be very clear in terms of what each step does and how it does it. You also want to provide detailed logging as much as possible, so people can easily click on the specific steps and see what’s going on behind the scenes.
When we started, we didn’t see the need to break things down into smaller steps. Then we started encountering issues, like people saying: “Hey, this is really bumming me out because the pipelines take too long. Why do I have to wait 15 minutes for the pipelines to run?”
These are good questions, but it’s even better to save the trouble by providing visibility. So we made everything in the [CI/CD] process as detailed as possible. Then, things got really easy, because no one complained, ‘Hey, my pipeline is taking 15 minutes.’ They came and said: ‘You know, in the testing space, you have one specific test that takes 10 minutes to run. What can be done about that?’ Now, it’s faster to diagnose problems and understand bottlenecks.
What’s Your Advice for Teams Switching to Automated Deployment?
Kalm: Make sure you have enough people to maintain it. When you have many corporate developers doing the enterprise stuff and thousands of people across the world doing the open source side, a couple people aren’t going to work when it comes to staying on top of the maintenance. Even though you have a CI/CD platform, there are enhancements, security and patches that have to be done. And if you have a new project come along, that pipeline will need to be developed as well.
“If I can have a pipeline that has two working parts compared to 20, that’s what I’m going to do. The fewer moving parts you have, the fewer are going to break.”
I’d also recommend the “KISS” strategy: Keep it simple, stupid.
When you have a CI system, the pipeline is a lot like a magnet — you drag the pipeline along and things stick to it. And then they become part of it, and then you have to fix it, and you have to maintain it, and you have to troubleshoot it. So if I can have a pipeline that has two working parts compared to 20, that’s what I’m going to do. The fewer moving parts you have, the fewer are going to break.
Horev: As you build the pipeline, think of everything you do as if you may have to run it in parallel someday. From the start, think about the pipelines being scalable and allow for parallel growth. Because if everything is successful, you are going to need it.
To illustrate this simply: When the pipelines are running, there’s got to be something handling them. Usually it’s called a worker or a runner — some kind of container or a server — which actually does the jobs the pipeline has to do, whether that’s uploading the code somewhere or running tests. It’s very easy as you build the pipeline to assume there’s only one of these runners or workers, but you’re not thinking about running one pipeline under another in the future.
Then, you have to make sure no one is stepping on each other’s toes. We have an automated process that writes release notes into a file. But if you have two runners running at the same time, and they both are trying to read and modify the same release notes file at the same time, you’re just going to have a big mess.
Conger: It’s really valuable to start simple and get some quick wins, especially if you’re a newcomer to the CI/CD space. You want to be able to prove a little bit of value as quickly as possible and show other developers and management that the pipeline will increase the velocity of your team. Once you have that foothold in place, a lot of shops find it accelerates really quickly and becomes an indispensable tool.
It’s also really valuable for all of the developers to have a part in creating your CI/CD process — to understand it and be able to fix or improve it as time goes on. They’re the ones that benefit from it. They’re the ones who understand the intricacies of how their code works.