Since its emergence in 2008, DevOps has been instrumental in breaking down the silos between software development and operations. But with the growing adoption of site reliability engineering, or SRE, the way engineering organizations are structured is transforming.
So what’s better, DevOps or SRE?
According to a recent Google Cloud blog post, neither. Rather, authors and tech professionals Seth Vargo and Liz Fong-Jones call them “close friends” designed to break down organizational barriers to deliver better software faster. It’s not uncommon for companies to utilize a combination of both methods. That’s exactly the case at mental healthcare company Mindstrong, which leverages a small SRE team to help the DevOps team build tools, automate playbooks and more.
Below, Chief Technology Officer Miguel Alvarado shares his thoughts on the nuances between DevOps and SRE, and why a combination of both works best for his organization.
“At the end of the day, the question should not be, ‘Do we use DevOps or SRE?’ The right question is: ‘How can we deliver the most seamless and reliable experience?’” Alvarado said.
What are the key differences between DevOps and site reliability engineering?
DevOps and SREs are practices with a high level of variability when it comes to what they practically mean from one company or another. At Mindstrong, it isn’t about choosing one or the other, but rather understanding these practices well and synthesizing them in a way that makes the most sense for our teams.
Looking back at the origin of DevOps back in 2008/2009 when the term was coined, it was a set of practices in response to a very inefficient “throw it over the fence” model. Back in those days, it was common to have a development team, QA team and operations team working separately. When development was ready to release to production, the operations teams would get instructions so they could start working on getting the code deployed. This process could take weeks given that they might not have seen or previously understood the code.
The DevOps movement brought about the idea that in the same way a developer should write tests and continually integrate their code with the latest changes, they should also have more direct control and visibility over the code in production. Instead of relying on someone else to deploy, monitor, and respond to alerts, the development teams themselves would be the first line of defense. When there are problems triggered by service and application code vs. infrastructure, they would have the independent ability to deploy changes or new code.
DevOps is more loosely defined, while the SRE model is more prescriptive.”
Eventually, the DevOps movement pushed the idea that in order to do all of this correctly, you need to be able to independently continuously integrate and deploy your software. CI/CD is at the center of DevOps working the way it was intended.
DevOps gives developers the responsibility and independence to operate their own code using common tools. Giving developers this responsibility and independence maximizes their ability to deliver reliable online software.
Moving away from traditional models
Companies like Amazon were running with DevOps-like processes way before DevOps as a term existed. For many years, Amazon’s philosophy has been to give developers a lot, if not full production access. The best software companies in the world, like Amazon and Google, were already running with their own flavors of DevOps way before the term was coined. For Google, it was the SRE model.
The SRE model is basically a concrete implementation of DevOps. SRE practices care about the same things that DevOps cares about. DevOps is more loosely defined, while the SRE model is more prescriptive. It introduces some interesting new concepts, such as service-level indicators, service-level objectives, risk and error budgets, toil and toil budgets, etc. In the SRE model, the team understands these concepts very well, knows how to automate measurement to enable them, and is able to work with all-dev teams to use the constructs and implement them as part of the services.
Does your team use DevOps, SRE or both?
At Mindstrong, we’re in the middle of evolving our DevOps world, and in our case, there is more nuance added when thinking about MLOps, DataOps, etc. At the end of the day, the question should not be, “Do we use DevOps or SRE?” What’s important is to be informed of the history of the practices, why they were designed, what problems they try to solve and how.
Understand all the constructs they offer, and then figure out the model that your company needs — but always keep it anchored on the user experience. The right question is: “How can we deliver the most seamless and reliable experience?”
What impact has SRE had on your engineering organization?
At Mindstrong, we’ve been building a small SRE team that helps the developer teams in building tools, CI/CD, automated playbooks, telemetry (logs and monitoring), distributed tracing, security monitoring and more. They’re providing the low-level fiber of the rest of the teams, but we expect developers to deploy and support their own software. To enable the latter, we’re creating a support rotation that all senior engineers have to participate in. The team will also own the consultancy to help teams form SLIs and SLOs that tie into our company SLA plus OKRs. We are also making sure that the SRE team does not become the “dumping ground” of busy production requests. This team needs to be focused on high-value work. The support rotation can ensure that the busy work is also evenly distributed.