I went on sabbatical earlier this year, and the great thing about sabbaticals is there’s never a shortage of things to learn. The question of how this sabbatical will benefit my career as a software engineer never strayed far from my mind, and I felt that pressure increase as the world dived into a recession. That’s why I chose to focus on DevOps as my primary area of specialization. As a software engineer, your ability to convert intellectual capital to financial capital is bottlenecked by your ability to ship working products to paying customers — in other words, how sophisticated your DevOps pipeline is. Of all the challenges building a great DevOps pipeline presents, the one I have ended up spending the most time tackling has been resource orchestration.
I don’t have too much experience shipping large-scale, old-school systems to the cloud, but I have shipped smaller projects, such as my tech blog, to AWS. In my experience, configuring just three different resources in the cloud (S3 + CloudFront + Route 53) to work together nicely can get quite hairy if you don’t know what you’re doing. That’s where a tool like AWS CloudFormation comes into play. If your stack definition doesn’t work, just tear down, update, and re-deploy. If you have another resource to add, throw it into the stack definition. If you need to deploy the same stack with different configuration variables, write another bash script or add a Makefile target. Systems of arbitrary complexity can be described, deployed, and managed using this approach. As one data point, I ship my own landing pages for various business ideas and whatnot instead of using a third-party site, and I’ve been able to cut down my personal time-to-ship from a few days to one hour. It might not be as well-optimized, or come with fancy landing page defaults themes, but it’s effectively free. Even better, this method means that whatever I ship, I own completely.
That being said, after some time, you will start to see how a tool like CloudFormation or its open-source alternative, Terraform, could be improved. The core insight is how they’re less infrastructure-as-code frameworks and more infrastructure-as-data frameworks. Instead of using a programming language to describe stacks, every stack is a data file, like a JSON or a YAML file, that gets passed to some server executing the remote API calls on your behalf. They’re not truly using a programming language in the conventional sense to define infrastructure stacks.
Without a programming language, you end up having to resort to interesting workarounds as your stack scales. For example, I wanted to divide up my stack into multiple substacks that could be deployed independently of each other since stack creation is bottlenecked by the slowest performing resource. Unfortunately, CloudFormation has no “import” statement like Python or TypeScript; instead, it has these intrinsic functions you combine to get the behavior you want. In my case, I combined !Ref and Fn::ImportValue to import a reference to an external CloudFormation resource. This process gets quite tedious as you scale your stack and the number of resource dependencies across substacks grow.
Another situation I encountered was attempting to keep my import statements both secure and transparent. Any CloudFormation reference you export from a CloudFormation stack is exported in plain text, so if you need to reference your database URI when standing up your containers, you’re publishing your password to anybody with the correct IAM credentials. As a workaround, you can export stack outputs to AWS Secrets Manager to be used in other stacks, but that introduces an extra step you need to template for every variable you want to keep secret, which may cost up to a few hours during every stack update.
CloudFormation and Terraform are great tools in their own right, but this experience did leave me wondering whether there might be a better way to approach this problem.
Recently, I had the opportunity to work with Pulumi, an open-source, Turing-complete infrastructure-as-code framework. I first learned about Pulumi from this Hacker News post. Instead of using a datafile like a JSON or YAML file with optional DSL extensions, you use a real language, like TypeScript or Python or Golang, and import Pulumi and any third-party SDKs in order to describe your infrastructure stack. This source code then gets flushed to the remote Pulumi server, which orchestrates resource management on your behalf through HTTP. I was both skeptical of and intrigued by Pulumi, and some points stuck out.
The first thing I noticed was how many dependencies Pulumi had underneath. Many libraries of the software development kit, such as `@pulumi/aws`, just call Terraform underneath the hood using a bridge layer, so I thought the tool would only be as useful as the tools it depended on. This hasn’t turned out to be a significant issue since the open-source nature of the tool means that you can effectively add your own support for features that don’t exist. This contrasts with AWS CloudFormation, which is proprietary and actually takes a few months to catch up with new AWS features.
The second thing that raised my eyebrows at Pulumi was its pricing. AWS CloudFormation is free, as indicated on its pricing page. So is Terraform, for up to five users. Pulumi is free for one project stack and one user, and then rapidly increases in cost for any meaningful org-based workflow. I’m still a little taken aback from sticker shock, but I think for larger engineering organizations, this cost is quite trivial when compared to, say, hiring another DevOps engineer. Pulumi also offers professional services in converting from Terraform to Pulumi, if that might help.
Lastly, I wondered what customer support looked like for Pulumi, since at least for AWS CloudFormation, it comes integrated with AWS customer support. As an AWS platform, I had figured CloudFormation had first dibs on learning what new products AWS was shipping. When I was working with Pulumi, this wasn’t a huge issue. No matter how spotty the official documentation might be, between Google, Stack Overflow, community Slack, and GitHub Issues, support for a general Pulumi question should be published and searchable.
I think Pulumi’s architecture addresses the issues I had mentioned before way better than CloudFormation possibly can. With a programming language, you can code yourself higher-order primitives to address most problems. For instance, secrets management in Pulumi comes out of the box, and can also leverage any given provider’s secrets management service (Hashicorp Vault, AWS KMS, etc). Pulumi also describes different patterns in stack construction, from monolithic to micro-stacks; it even supports guides for best practices such as Pulumi Crosswalk, which may lend itself well to creating template and reference architectures for solution architects.
I probably would have more to say about Pulumi once I’ve used it for a while, but suffice it to say that it’s a promising new technology that I’ll be keeping an eye on.