Balancing speed with precision is no easy task for a product team, but there are a couple of rules that tech leaders at Caxy and McMaster-Carr swear by.
At Caxy, COO Hannah Deason said the size of the release dictates the size of the safety net.
“A hotfix to one client’s staging branch doesn’t need the same gates as a coordinated multi-tenant production push and pretending otherwise is how teams either release slow or release scared,” she said.
That same principle governs where AI shows up at Caxy, i.e. the bigger the surface area, the more humans in the loop are needed.
Akhil Patel, a senior engineering manager at McMaster-Carr, takes a similar approach to evaluating blast radius. That’s why his team keeps changes small, reversible and observable.
“Small changes are easier to understand, test and fix,” Patel said. “When something breaks, there’s less to reason about.”
For teams trying to balance velocity with confidence, Deason and Patel share their secrets to successful releases — and the metrics that prove their methods stand up.
Caxy is a Chicago-based software consulting and custom software development agency.
What’s your rule for fast, safe releases — and what KPI proves it works?
Our rule is a marriage of two beliefs.
The size of the release dictates the size of the safety net. A hotfix to one client’s staging branch doesn’t need the same gates as a coordinated multi-tenant production push and pretending otherwise is how teams either release slow or release scared. We size the review, the testing and the human eyes to the blast radius of what’s changing. That same principle governs where AI shows up — the bigger the surface area, the more humans in the loop.
Releases that sit are releases with risk. We want releases at least monthly and no work sitting stale more than six weeks. Stale code creates as much risk as rushed work. Two sides of the same coin.
The KPI that proves it: We look at the story of these two rules by pairing defect escape rate percent with cycle time. Either one alone lies. When velocity is high and escape rate is low, we know we’ve crushed it. We target an escape rate under five percent and a rolling cycle time of 30 days or less. Fast releases with bugs are just technical debt. Slow releases with clean output erodes competitive edge and product experience. Watching them together tells us if the system is actually working — or just feeling like it is.
Which standard or metric defines “quality” in your stack?
Our standard is that quality is a panel — never a single number. Any metric in isolation will lie to you and the team that manages to one number eventually games it without meaning to. We watch several together and the combination is what tells the truth.
The panel is production outages and issues per year, with a target under two. Test coverage on anything high-blast-radius. Escape rate to production. Cycle time under a month so features don’t go stale. And the ratio of bugs and regressions to new features — under 10 percent is good, under five percent is great.
There’s one more in the panel and it’s the one most people miss or misunderstand. It’s evidence your tests are actually finding failures. A 100 percent pass rate isn’t a win — it’s alarming. It usually means tests are exercising code without being written to catch anything — happy paths only, because happy paths are easier to confirm. That’s how blind spots get baked in. To combat this, you need all testing represented: integration, functional and load tests in a combination of automated and manual methods. The mix, balanced to the needs of the product, is what gives you the win.
Name one recent AI/automation that shipped and its impact on the team or business.
For Caxy, AI earns its value in our arsenal by proving itself on a real workflow first. Some recent examples do that well.
For a client, we built an AI proposal generator. Their sales pricing workbook goes in and a fully formatted Word proposal comes out — scope, language and tone pulled from a RAG knowledge base of their own historical proposals. They’re now sending 2-3 times the proposals per week with better consistency. More time in the field, bigger funnel, more sales.
Internally, we built an AI layer over our internal PM tool, PulseCheck. The AI helps forecast risk, suggests process improvements, drafts dynamic status updates, generates reports and surfaces channel sentiment to flag burnout weeks before it shows up in a client meeting. Team leads can act faster and leadership gets involved earlier — issues are anticipated and solved before they can surface.
On a smaller scale, we use AI across our SDLC for test writing, debugging, smaller feature dev, tech documentation, cherry-picking and initial code review against defined standards — all freeing the team for work that actually needs human judgment.
A one-stop-source for industrial supplies, McMaster-Carr is an e-commerce company offering more than half a million products used to keep business in motion.
What’s your rule for fast, safe releases — and what KPI proves it works?
Make changes small, reversible and observable.
Small changes are easier to understand, test and fix. When something breaks, there’s less to reason about and a smaller blast radius. Reversible changes (i.e. using feature flags, safe rollbacks and backward‑compatible schemas) allow us to recover quickly and gracefully when problems occur. Observable changes have clear signals that tell us whether they’re working, so issues can be detected and contained early.
The KPI that proves this works is Change Failure Rate.
Small changes introduce fewer defects, observable changes surface problems quickly and reversible changes reduce the impact and duration of failures. Together, they lower the likelihood that a change results in user‑visible issues while enabling fast delivery.
Which standard or metric defines “quality” in your stack?
At the risk of double dipping, Change Failure Rate is also how I define quality. Changes exist to benefit the end user, whether that’s an external customer or an internal team. When a change disrupts their ability to use the system, it causes harm, which is the opposite of what we’re trying to achieve. To me, a high‑quality change is one that delivers value while protecting the user experience. CFR is the clearest signal of that, because it measures quality where it actually matters: in production.
Name one recent AI/automation that shipped and its impact on the team or business.
One recent AI/automation we shipped was improving search relevance on mcmaster.com using LLMs. We receive thousands of searches that our traditional search engine struggles to interpret, most commonly manufacturer part numbers or foreign‑language queries. That led to poor search results and customer abandonment.
We introduced an external LLM as an intent‑interpretation layer that maps these queries back to our product catalog before running the search. The LLM augments the system rather than replacing core search logic, which kept the design safe, explainable and easy to reason about for engineers supporting the system
We validated the change through an A/B test. Customers exposed to the LLM‑powered results were more successful at finding relevant products and placed more orders, driving a measurable lift in conversion. It also reduced failed searches, lowering friction and improving the overall customer experience.
