Talking about process evokes images of complex workflows, endless forms, and tedious boredom. Perhaps the TPS reports of Office Space come to mind, and, with that, the drab, gray cubicles of an office environment that would be better left in the 20th century. Contrast that hellscape with the bright, airy open spaces of the modern, well-funded startup (pre-lockdown, at least).
The thing is, though, unsexy as it may be, process is the key to releasing software that’s maintainable, stable, and secure.
Writing and releasing software without any process whatsoever is possible, but quickly leads to an unstable codebase that begs for a rewrite. I’ve been there at least a couple of times in past jobs, and I saw this type of debacle happening before my very own eyes. Frantic post-release firefighting, preventable security blunders, and being forced to haphazardly bolt on functionality are the trademarks of such an approach.
On the other hand, implementing a rigid, overly detailed process in which every single step of the software development journey is codified leads nowhere. Team velocity grinds to a halt as each engineer is consumed by bureaucratic tasks instead of making software. Alternatively, they might end up repeatedly breaching the established procedures in order to get things done, resulting in a situation similar to having no process at all.
Ideally, you should find a balance. You want to define a process that reduces the risk of mistakes or outright disaster but also leaves enough leeway for creativity and the individual expression of each team member’s talents. In other words, you want to set up and enforce guidelines to lead the team toward the so-called “right way of doing things” without being overly prescriptive.
This is all well and good, but it’s also airy and theoretical. Let’s see how to put it in practice.
Analyze and Optimize Before Rebuilding
Chances are you aren’t building a new engineering team from scratch. Unless you’re literally founding a startup as you read this, you’re probably either coming into or already working with an existing team that has its own processes in place.
Each team has both good and bad habits regardless of whether or not they have a process or workflow to describe how things should be done. A good place to start is to understand where the bad habits come from and then rework the guidelines in those areas first. As an example, I was once brought on to lead a team that, by their own admission, had serious issues with the quality of their work, affecting the predictability of releases.
The first step is always to observe, and what I observed was a sloppy practice overall. The team had a tendency to release without doing much testing and then patch later. More alarmingly, this patching usually happened directly on the production servers, live while customers were using the SaaS platform that the team had developed.
In situations such as this, you might be tempted to go in and replace whatever process is in place with a standard, by-the-book procedure. Doing so doesn’t guarantee that these problems will be solved, however. That’s because standard procedures are, by their nature, generic. They don’t take into account the nuances of real teams. You need to do more to make them work.
A better approach is to understand why the team would act so hastily. What led them to a point where their procedure was so scattered? In the case above, although there were a few influencing factors, I found that one was so critical that it required immediate attention.
The release procedure that the team had adopted was a simple checkout of the whole codebase from source control directly on the production servers. This process took far too much time because the system used for source control was a centralized solution, obsolete even by the contemporary standards of the time. The team decided that they would rather take the risk of hot-patching the live production environment than wait for a deployment that could easily take up to one hour to go from development, to staging, to production. Additionally, that source control system didn’t really have a branching/tagging system to mark releases, making it risky to simply redeploy anyway.
And so, what looked like an issue with the post-release maintenance process was in fact just a symptom of a more deep-seated problem. Source control needed a revamp, and that’s exactly where we started making changes. We switched to Git and adopted Gitflow to properly manage releases within the source control system. The team took some time to adapt to the new process but was generally quite supportive from the beginning, buying into the idea of not having to patch the live servers directly anymore. Within a couple of months, that extremely risky behavior was a thing of the past.
Learning From Our Mistakes
Alongside Git and Gitflow, we also deployed Gerrit to make code reviews easier. Previously, the team hadn’t really had a formal code review process, so this was something new for them. Now, they had a formal way of approving or rejecting changes made to the software.
That in itself raised a few questions: Who should have the authority to approve changes? What if someone approves a change that breaks the system? Why should Engineer X have a say about the quality of my code? It’s scary enough for an engineer who’s new to this way of working to have a transparent codebase where anybody can see who made what changes and when. You can imagine the added pressure of having your name permanently associated with a decision to either approve or reject a change.
Now, in such a situation, you can spend a lot of time coming up with a super-precise code review guideline. For instance, you can define groups who have approval rights, and groups who can comment on or reject code but not approve it. You can have different people on these groups for different parts of your software, establish the minimum review count before a change can be accepted, and so on. Coming up with such a process is going to take time, and it’s difficult to get it right. Also, the more detailed you get, the more likely you are to frustrate your team and to see process exceptions popping up here and there.
Another option is to go the other way and let the process grow organically instead. You can start with a generic guideline. For instance, you might say, “We need at least two senior engineers to review and approve any change submitted,” and then grow from there. This approach is prone to creating more mistakes. That’s fine, though, as long as the team learns from those mistakes.
The important thing here is to recognize when these mistakes necessitate a change in the process. A grave mistake is likely enough reason for an immediate process change. For more trivial issues, though, a knee-jerk, immediate change in the process is counterproductive. The team is likely to get frustrated by continuous process changes, and you’ll end up with a hot mess. In these cases, you’re much better off observing and monitoring, and only change the process if that mistake happens often enough.
In short, the important thing is to keep the process fluid and change it only when it makes sense to do so. To better illustrate this point, let’s look at an example that shows how even simple guidelines with the best intent can be misinterpreted once applied in the field and what we can do about that.
On the team I mentioned earlier, we went for the organic approach when setting up the code review process. I used a guideline not too dissimilar from the “two senior engineers” one I mentioned above. That didn’t work perfectly right off the bat, however.
Often, we found that the second senior engineer approved the code for submission automatically, without really thinking critically about the change. Perhaps this was due to lack of time or unfamiliarity with the portion of the system being modified. Regardless of the cause, we found that the redundancy built into the system wasn’t fulfilling its purpose.
We then reviewed this issue as a team at our end-of-week meeting. After discussing, we decided to change it as follows: “We need two engineers to review and approve any change submitted. If you approve a change, you sign up to fixing the code if it breaks the system.” In other words, we dropped the “senior” requirement, but added an element to ensure the engineer truly understood what the code was doing before approving it.
The Secret of Process Optimization
In conclusion, process is a necessary evil to mitigate the risks associated with making and changing software. A good process is balanced: you don’t want to have an extremely detailed workflow that stifles creativity, but also you don’t want to just have a generic guideline that is open to misinterpretation and leads to the same mistakes being repeated.
Processes need to be reviewed regularly and changed when necessary. The secret behind process optimization is really that we need to ask the right questions after a thorough observation of the process in operation: Why is the team behaving in a certain way? What led them there? What can we do, together, to fix this going forward?
The process isn’t a set of laws. It isn’t there to punish people who breach it. The process is a tool to define the right way of doing things, and help people achieve greatness in their job without having to constantly assess well-understood risk.
In parting, that’s the question I’d like you to ask yourself: How much of my team’s process is redundant and purely bureaucratic? On the other hand, where in our operations can the team benefit from a well defined guideline, instead of making things up every time they encounter that specific scenario? The answer to these two questions is what should drive your process optimization endeavor, as you strive to find the balance between being too strict and too lax, to enable each team member’s creativity to deliver great work.