Software is the sum of its parts, and containerization is the process of bringing an application’s most important pieces together into one neatly wrapped package. By containerizing, developers bundle a program’s code, runtime engine, tools, libraries and settings into a portable “container.” That way, the software requires fewer resources to run and is much easier to deploy in new environments.
What Does Containerization Mean?
Tim Hynes, a software engineer at cloud data management company Rubrik, defined containerization this way when speaking with Built In in 2020: “Containers are a way of packaging an application so that it’s easy to get the application and run it in any kind of environment. So, a lot of the complexity of installing and configuring an application is taken away. Containers let a developer abstract all of that and make a very simple package that’s easy to consume.”
What Is Containerization in DevOps?
First appearing as “cgroups” within the Linux Kernel in 2008, containers exploded in popularity with the introduction of Docker Engine in 2013. Because of that, “containerizing” and “Dockerizing” are often used interchangeably.
On the surface, containerizing software is a relatively straightforward process: a “container file” with the software’s information is converted to a lightweight container image, which becomes a real container at runtime through a runtime engine. Any image or engine managed by the Open Container Initiative will work with any other image or engine built using the same standards.
But why do containers exist, and what do they achieve?
One way to view containers is as another step in the cloud technology journey.
Before the advent of virtualization and cloud technology, software used to run on individual, physical machines. Each machine came with its own operating system, which often led to broken programs and downtime as developers tried to deploy software written on a Windows system on a machine running a Linux system, for example. Trying to build testing environments that perfectly mimic production environments was time consuming, so developers needed a better way.
Virtual machines, or virtual computing environments that run on top of physical hardware with the help of a piece of software called a hypervisor, provided that better way. Software running on a virtual machine comes packaged with its own guest operating system, which makes it less likely to break due to incompatibilities with the host OS.
Then, people figured out how to combine and share resources among virtual machines, and the cloud was born.
But virtual machines still had some hang-ups: Running all those guest operating systems took up a lot of computing resources. That’s where containers come into play.
Containers don’t come bundled with their own guest operating systems. Instead, they use software called a runtime engine to share the host operating system of whatever machine they’re running on. That makes for greater server efficiency and faster start-up times — most container images are tens of MB in size, while a VM generally needs between four and eight GB to run well.
Containerized Applications Led to the Explosion of Microservices and Kubernetes
Before containers, developers largely built monolithic software with interwoven components. In other words, the program’s features and functionalities shared one big user interface, back-end code and database.
Containers made it a lot easier to build software with service-oriented architecture, like microservices. Each piece of business logic — or service — could be packaged and maintained separately, along with its interfaces and databases. The different microservices communicate with one another through a shared interface like an API or a REST interface.
With microservices, developers can adjust one component without worrying about breaking the others, which makes for easier fixes and faster responses to market conditions. Microservices also improve security, as compromised code in one component is less likely to open back doors to the others.
Containers and microservices are two of the most important cloud-native tech terms. The third is orchestration or, more specifically, Kubernetes: an orchestration platform that helps organizations get the most from their containers.
Containers are infrastructure agnostic — their dependencies have been abstracted from their infrastructures. That means they can run in different environments without breaking. Cloud-native organizations began taking advantage of that portability, shifting containers among different computing environments to scale efficiently, optimize computing resources and respond to changes in traffic.
Through its user interface, Kubernetes offered unprecedented visibility into container ecosystems. It’s also entirely open source, which helped organizations adopt containerization without getting locked in with proprietary vendors. Last, it aided the widespread transition to DevOps, which boosts agility by marrying the development and maintenance of software systems.
“The way things are moving, and how application support works now, I think more and more engineers are expected to be responsible for their code running day to day,” Hynes told Built In.
Containers answered the question, “How do we simply package software with everything it needs to deploy seamlessly?” Kubernetes — as well as other orchestration systems like Helm — answered another: “How do we simply understand a piece of software’s requirements, connectivity and security so it can interact seamlessly with third-party services?”
“The service-mesh-type technologies that have come up around containers really help with packaging the other requirements that application has, beyond just things like code libraries,” Hynes added.
Containerization in Cloud Computing
In the past 15 years or so, software development has focused intently on improving stability, or avoiding broken code and downtime, Hynes said. Test-driven development and other agile principles, like YAGNI, helped make software simple, adaptable and stable. But that stability came at a price.
“One thing that was lost in creating all of that stability was the ability to scale quickly and to change parts of the application, to pivot what the company does and address market trends,” Hynes said.
Fast scaling and quick pivots are what so-called cloud-native technology is known for. Usually, containers are a foundational element of what we mean when we say “cloud native.”
“[Containers and cloud native] are definitely synonymous, I think, in people’s minds,” Hynes said.
But plenty of applications built to run in cloud environments don’t use containers at all, he added. A public cloud can host a monolithic application just as easily as a collection of microservices. Furthermore, containers can run in most any computing environment. Whether developers want to run containerized software via on-premises physical machines, private virtual machines or public clouds depends on their security, scalability and infrastructure management needs.
Advantages and Disadvantages of Containerization
In a 2021 survey by the Cloud Native Computing Foundation, 93 percent of respondents reported they already use or plan to use containers in production.
But if containers offer such advantages in terms of portability, scalability and responsiveness, why don’t all organizations containerize their software?
One reason is the speed of technological advancement. Containers haven’t been popular for that long. In a relatively short amount of time, leaders across industries also had to wrap their minds around public cloud offerings, DevOps, edge computing, artificial intelligence and other innovations. Sometimes, it’s tough to keep up.
Hynes noticed this dynamic as he helped Rubrik customers navigate cloud and DevOps transformations.
“It’s quite hard for customers to follow what’s going on, where it’s at today and how that compares to a year ago, for example,” he said. “It’s definitely a challenge for customers, especially when they’re not in the application development space.”
Another reason companies avoid containerization is that containerizing legacy software is not as easy as flipping a switch.
Large organizations frequently rely on revenue-driving software systems that are 10, 15 or 20 years old, Hynes said. Their huge, back-end databases may be running on database engines that have been around for decades, and the front ends often haven’t been touched in years.
“They’ve just maintained it over the years, rather than investing heavily in it, because they really don’t want to break it,” he explained.
For companies like that, the transition to containers presents big risks without clear benefits. The legacy systems are stable, and everyone knows how to use them. Why mess with a good thing?, the thinking goes.
And that thinking is often correct. Even if the transition goes smoothly, a few hours of downtime for a mission-critical system at a big company could cost millions of dollars. And legacy companies might not have people on staff with the skills to administer sweeping container environments.
Rather, companies in that position should switch to what Gartner calls bimodal IT, Hynes said, with some teams maintaining the steady-state, long-tail applications responsible for large portions of revenue, and other teams focusing on innovative, cloud-native product development.
Containerization 201: Common Misconceptions About Containerized Applications
The number of developers with containers in production has skyrocketed in recent years. In some areas, the popularity of containers has outpaced the education and training surrounding them. Here are some common containers myths to be busted:
CONTAINERS ARE FOR BUILDING NEW APPLICATIONS
Just because an application already exists doesn’t mean it can’t or shouldn’t be containerized, Red Hat senior director of product strategy Brian Gracely told Built In in 2020. Early buzz around containers made many believe this tech is only for developers building cloud-native apps from the ground up. But that’s simply not true.
“Over the last couple of years we’ve built a lot of capabilities that allow you to take an existing application and use containers to really get a lot of additional benefit out of it,” Gracely said. “We’re seeing people using containers as part of a really broad modernization strategy. In some cases, it’s to help 10- or 20-year-old systems.”
“We’re seeing people using containers as part of a really broad modernization strategy. In some cases, it’s to help 10- or 20-year-old systems.”
But modernizing takes work — and nobody likes when perfectly functional systems start feeling like technical debt, Gracely added. That said, applications that stick around for years usually stick around because they’re important — and letting them stagnate doesn’t benefit anyone. The key for companies like Red Hat, he said, is to help customers distinguish between new technology that’s going to be genuinely helpful and new technology for technology’s sake. Containers often fall into that first category.
“We always want to remind people, like: ‘Don’t feel bad for not wanting to talk about [legacy apps], but they are the most important ones for your business. We need to make sure that we help you either take cost out of them, or modernize them and make them run better,’” he said. “We don’t just get enamored with the headlines about the newest shiny thing.”
CONTAINER SYSTEMS MAKE EVERYTHING EASIER
That’s why developers with Kubernetes skills are in such high demand. As of late 2020, just 15,000 people had received Kubernetes certifications from CNCF. (“By the way, that’s only the people who have passed the exam,” CNCF general manager Priyanka Sharma told me. “It’s a difficult exam, and it’s not multiple choice.”)
CONTAINERS ARE FOR STATELESS APPLICATIONS ONLY
When Docker’s container runtime first hit the scene, the broad expectation was that containers were only useful for stateless applications, which don’t store data from one server-client transaction to the next.
Some people still associate containers with stateless apps, Hynes said. But due to advancements in container technology, it’s now possible to create stateful, containerized applications that store persistent data.
“The apps are totally portable, but the data may not be.”
“With containerization, what’s really interesting is how you bind those two together, how you can make the data as portable as the application,” NetApp senior director of product management McClain Buggle told Built In.
NetApp is one provider of portable data planes that customers can access and orchestrate through Kubernetes. That means an application’s data could follow it as it moves through different computing environments in a public cloud, for example.
“Application orchestration is kind of a misnomer,” Buggle said. “The apps are totally portable, but the data may not be. So the evolution we are seeing is that you can really make a stateful application portable if you can make the data plane portable as well.”
5 Examples of Containerized Applications
What companies are now deploying containers, and how? At this point, a comprehensive roundup of containerized apps would be large enough to have its own congressional district. But a sample of how a few notable early adopters implemented containers illustrates why the technology was so transformative, and why — more than five years after container excitement first ratcheted up — it continues to grow more widespread.
Spotify recognized the value of containers early on. The audio streaming platform began using containers in 2013, when Docker was new, and had even built its own in-house orchestration system, called Helios. (The company open-sourced Helios one day before Kubernetes was announced, Spotify software engineer Matt Brown told Google Tech Cloud. Spotify began transitioning to Kubernetes shortly after.)
The shift lessened the amount of time it took to get an operational host for a new service from an hour down to minutes or seconds, and it tripled improvement in CPU utilization, site reliability engineer James Wen said in 2019. More recently, Spotify developed and open-sourced Backstage, a developer portal that includes a Kubernetes monitoring system.
THE NEW YORK TIMES
The New York Times, another early adopter of containers, similarly saw deployment times nosedive after moving from classic virtual machines to Docker. What previously took up to 45 minutes fell to “a few seconds to a couple of minutes,” Tony Li, a staff engineer at the paper, said in 2018, about two years after the Times decided to move from private data centers to the cloud and dive into cloud-native tech.
Different social media apps have different image aspect ratios, which means social scheduling platforms like Buffer have to resize a given image to fit well across the various channels linked to a user’s account.
After Buffer began ramping up the number of applications running on Docker in 2016, its resizing feature was one of the first services the company fully built with a modern container orchestration system. Containerizing the application allowed for the kind of continuous deployment that would quickly become table stakes in DevOps. “[W]e were able to detect bugs and fix them, and get them deployed super fast. The second someone is fixing [a bug], it’s out the door,” Dan Farrelly, chief technology officer at Buffer, said about a year after the migration kicked off in earnest.
Squarespace began migrating from virtual machines to containers around 2016. The website-hosting platform was experiencing the same computing-resource pinch as others in the virtual machine era. Developers spent a lot of time provisioning machines whenever a new service was ready to productionize or when an existing service needed to be scaled, Squarespace principal software engineer Kevin Lynch told CNCF in 2018.
The shift allowed developers to ship services without any site reliability engineering involvement and reduced deployment times as much as 85 percent, Lynch said.
GitLab has described Kubernetes adoption as one of the “biggest tailwinds” driving long-term growth for its business, alongside the likes of remote work and open source (no surprise, given that GitLab and Kubernetes are both cornerstone systems in the DevOps toolkit).
In 2020, GitLab director of platform infrastructure Marin Jankovski shared some numbers about how the migration from traditional virtual machines to containers has impacted the company’s infrastructure size, performance and deployment speed. In short, applications are running faster on fewer machines. Workloads run on three nodes, down from 10; processing units are three times faster; and deployment speeds are 1.5 times faster, Jankovski wrote.
Stephen Gossett contributed reporting to this story.