“Speed comes with discipline, not shortcuts.”
That’s the philosophy that drives the tech teams at Sysco LABS, the innovation arm of foodservice provider Sysco, according to Senior Director of Global Platform Engineering and Services Dilhan Manawadu. Those on his team maintain small, automated and observable releases, and ensure changes go into automated testing pipelines that show clear ownership, security and observability.
“If it can’t be safely rolled back or clearly monitored, it’s not ready to ship,” Manawadu said.
In taking a careful approach to fast, safe releases, Sysco LABS’ tech teams never compromise quality when delivering new solutions. Engineering Director of Infrastructure and Productivity Andy Chow and his teammates at fintech company Airwallex also prioritize small, focused changes, while embracing a simple yet effective standard: Every change requires a merge request with human review.
“We make no exceptions, even for code by AI agents,” Chow said.
Below, Manawadu, Chow and employees at seven other companies share their team’s rule for fast, safe releases, how they measure quality and a recent automation that has made a major impact on the team or business.
Featured Companies
Fintech company Airwallex offers global businesses fully integrated solutions to manage everything from business accounts and payments to spend management and embedded finance.
What’s your rule for fast, safe releases — and what KPI proves it works?
Our rule is simple: Every change requires a merge request with human review. We make no exceptions, even for code by AI agents. We combine this with automated CI gates and progressive rollouts to ensure speed doesn’t compromise safety.
Our internal platform, AirDev, runs AI agents that autonomously implement well-scoped tasks. These agents submit merge requests that follow the same review and CI pipeline as human-authored code.
Throughput of merged changes proved the system works. Since launching AirDev in late 2025, our agents have contributed over 11,000 merged MRs across 200+ repositories. Monthly volume grew from 60 MRs in the first two months to over 3,700 in a single month by March 2026. That’s roughly 120 agent-authored MRs reaching production every day, all passing the same quality gates as human code.
We now ship previously deprioritized work continuously alongside new features. We track deployment frequency and change failure rate to ensure we ship faster without regressions. Our internal survey gave AirDev a 4.2/5 satisfaction rating, with 91 percent rating it four or five stars.
Which standard or metric defines “quality” in your stack?
Our quality principle is simple: AI-generated code must meet the exact same bar as human-written code. Every merge request follows the same path. Whether from an engineer or an AirDev agent, it must pass automated CI checks, static analysis, SonarQube quality gates and human review. If a gate fails, the change doesn’t ship.
“Our quality principle is simple: AI-generated code must meet the exact same bar as human-written code.”
This rigor matters because our AI agents have now contributed over 11,000 merged MRs. At that scale, you can’t afford a separate quality bar. Our engineers review every agent-authored MR to verify it follows existing patterns, handles edge cases and avoids unnecessary complexity.
We’ve learned that context beats instructions. Agents that understand why a change is needed produce better code than those following rigid specs. That insight shapes our quality standard. We want solutions that are idiomatic and easy to maintain.
Finally, we prioritize small, focused changes. We’d rather ship 120 focused MRs a day than a few sprawling ones. Small changes are easier to review, safer to deploy and faster to roll back.
Name one recent AI/automation shipped and its impact on the team or business.
We built AirDev to handle software tasks from start to finish. Our AU agents manage everything from reading a Jira ticket to submitting a merge request. These agents work in isolated environments with access to our full toolchain, including Git, GitLab, Jira, Confluence and CI/CD pipelines. They clone repos, analyze existing patterns, implement changes, and open MRs for human review.
Since launching in late 2025, AirDev has produced over 11,000 merged MRs across 200+ repositories, contributing roughly 440,000 lines of code. Monthly output grew 60 times, jumping from 60 merged MRs at launch to over 3,700 in March 2026.
The impact goes beyond volume. Engineers have reclaimed about 20 engineering-years previously spent on repetitive tasks like configuration updates, boilerplate endpoints and test coverage. This frees them to focus on design and mentoring.
The biggest surprise was the shift in priority. When the cost of a well-scoped task approaches zero, we can finally tackle work we used to ignore. We now fix technical debt alongside new features instead of pushing it off. AI boosts our capacity without lowering our standards.
SoFi’s financial products and services are designed to help people borrow, save, spend, invest and protect their money.
What’s your rule for fast, safe releases — and what KPI proves it works?
Code coverage is a single most important metric for assessing code health. It is a measure of how well our code has been covered with unit tests. Mobile repo has over 150 developers, and we merge over 25 changes each day. Unit tests provide the first level of defense and are a primary contributor to our stable, safe releases. Even though our code now has over 2 million lines of dart code, we have been shipping consistently to app stores every week for the past two years.
“Unit tests provide the first level of defense and are a primary contributor to our stable, safe releases.”
Which standard or metric defines “quality” in your stack?
Crash-free sessions: Crashes happen on mobile apps when the app gets into an unexpected state and it cannot provide accurate information for the member. The app closes suddenly. We track this metric pretty diligently, and it has been stellar in our Flutter app. Over the past three months, we have had 99.99 percent and 99.98 percent crash-free sessions on iOS and Android respectively, even though our app gets more than 1 million daily active users.
Name one recent AI/automation shipped and its impact on the team or business.
The mobile team has adopted a newer testing framework, using Mocktail over Mockito to reduce overhead of mock files. Conversion from Mockito to Mocktail is straightforward but tedious, so we create a comprehensive AI skill/Mockito-to-Mocktail that can take in a single file or multiple files and convert them correctly.
We iterated on this skill with many examples and edge cases, and now we have 98 percent percent confidence that any given test file can easily be converted. This skill has helped the team reduce our tech debt and increase code quality.
InterSystems’ cloud-first platforms enable organizations from various industries, such as healthcare and financial services, to power their applications with clean, accessible data.
What’s your rule for fast, safe releases — and what KPI proves it works?
InterSystems’ rule is standardized, automated releases with built-in validation against service level objectives across all environments, including our own datacenters, hosted and private cloud, and public cloud. Every change is delivered through infrastructure as code and consistent pipelines, with clear rollback paths and production-level observability.
We measure success using deployment success rate, change failure rate, mean time to recovery and SLO adherence. The key signal is our ability to increase deployment velocity across all environments while maintaining SLO performance and minimizing customer impact. If we are releasing faster everywhere and still meeting our SLOs, the model is working.
“If we are releasing faster everywhere and still meeting our SLOs, the model is working.”
Which standard or metric defines “quality” in your stack?
Quality is defined by meeting and sustaining SLOs that underpin our service level agreements consistently across data centers, hosted cloud and hyper-scalers.
We anchor on SLO and service level indicator performance, such as availability, latency and error rates tied directly to customer SLAs, customer impacting incident rate and severity, consistency of deployment and operations across environments through infrastructure as code, and security and compliance alignment based on NIST and CIS controls.
In practice, quality means we deliver a predictable and reliable experience regardless of where the workload runs, with the same operational standards applied everywhere.
Name one recent AI/automation shipped and its impact on the team or business.
One example is that our team has implemented AI-assisted, agent-driven infrastructure as code automation to standardize deployments across our data centers, hosted cloud environments and public cloud platforms.
By leveraging AI to accelerate development and extend our automation framework, we have scaled platform support across hypervisors and hyper-scalers without linear engineering effort, reduced deployment time and manual configuration across all environments, and improved consistency, which directly supports SLO adherence and SLA commitments.
The impact is a more scalable and unified operating model where we can onboard and operate environments anywhere with the same level of reliability, speed and control.
Huntress’ cybersecurity platform enables organizations to employ endpoint detection and response, protect Microsoft 365 environments and employee identities, offer security awareness training and more.
What’s your rule for fast, safe releases — and what KPI proves it works?
Our rule is that velocity is a function of confidence. If a developer has to wait an hour for a test result or fears a deployment, they’ll move slower. We keep feedback loops tight by tracking test suite runtime and flakiness as first-class citizens. To prove it’s working, we look at the spread between our deploy frequency and change failure rate, essentially measuring how fast we can go without breaking things, while keeping an eye on mean time to detect to ensure that when we do fail, we know it before our users do.
Which standard or metric defines “quality” in your stack?
We use SLOs to keep the lights on for our critical platform components, but we don’t believe in a “one-size-fits-all” definition of quality. The SRE team provides the framework, but we sit down with individual product teams to figure out what “good” actually looks like for their specific users. Quality is really defined by that balance: meeting our global uptime targets while ensuring each team has the right telemetry to uphold their own standards.
“Quality is really defined by that balance: meeting our global uptime targets while ensuring each team has the right telemetry to uphold their own standards.”
Name one recent AI/automation shipped and its impact on the team or business.
We’re currently integrating AI into our observability stack to move toward a more automated diagnosis model. The focus is on in-cluster agents designed to synthesize context across Kubernetes resources, essentially pulling a root cause out of the noise much faster than a human can manually. On the developer side, we’re leaning into Claude Code to bridge the gap between writing code and shipping it. By building internal skills for infrastructure provisioning and testing, we’re making our best practices the “path of least resistance” for the teams.
Sendbird’s AI customer experience platform enables brands to communicate with consumers via voice, video and messaging.
What’s your rule for fast, safe releases — and what KPI proves it works?
Ship behind a validation gate. Every change to a live demo or prospect environment goes through a proxy that intercepts the API call, validates the payload, and rewrites anything that would silently corrupt the data. If it breaks, it never leaves my machine. The KPI I watch is the demo failure rate at runtime, basically how often something breaks when a prospect is on the call. Mine is effectively zero now. Before the proxy and the pre-demo health check skill, I had a last-minute scramble on roughly one in five demos. The trick isn’t going slower; it’s making “broken” impossible to deploy in the first place.
Which standard or metric defines “quality” in your stack?
For me, quality is whether the demo holds up under prospect-specific pressure. The metric I use is an eight-dimension Demo2Win scorecard that runs after every call, scoring things like emotional arc, prospect-language fidelity and whether the tools returned data that actually matched what the prospect cares about. If the agent uses generic data instead of the prospect’s, that fails. If it uses words the prospect never said, that fails. The score correlates with deal progression, which is the only metric that really matters in pre-sales.
Name one recent AI/automation shipped and its impact on the team or business.
End-to-end deal automation from a Gong call to a fully deployed, branded demo: It pulls the transcript, extracts what the prospect actually cares about, designs the demo arc, writes the actionbook, configures the AI agent on our platform, deploys a prospect-branded frontend, and scores my performance after the call. Prep went from two days to under an hour. The bigger impact is that the demo is now built from what the prospect actually said, not what I guessed they’d care about. The output is sharper and more relevant, and the win rate on demos run through this pipeline is materially higher than my pre-pipeline baseline.
Product.ai’s platform is powered by axiomatic intelligence, a proprietary adversarial reasoning methodology that stress-tests product claims against physics, economics and engineering constraints.
What’s your rule for fast, safe releases — and what KPI proves it works?
Our engineering runs through two surfaces: an AI coding agent and GitHub. The spec comes first. Engineers spend most of their time on the shape of what we are about to build — data models, contracts and invariants. The code is generated against that spec, then it has to clear the automated gates before it merges, including the compiler, the type checker, the linter, and the full test suite. If any one fails, the build fails. We also run code review, not as the last line of defense, but to verify the code and the tests themselves are sound. Weekly, we do a deeper review of the codebase to catch anything that crept in. That is expensive, but it is how quality holds over time. We do not wait for QA teams. We use agents to red-team.
Our metric is how much earlier a feature ships than it was originally planned. A roadmap item scheduled for quarter three, which ships in April, pulled 170 days of roadmap forward. That is the number that tells us this works. Velocity without that number is noise.
Which standard or metric defines “quality” in your stack?
Quality is not test coverage. Quality is how much the work moved something real for users. Every goal we set has a falsifiable test attached to it. Traffic crossed the baseline and was held for 30 days. The model stayed fast under load. Users stayed longer. If you cannot state it that way, it is not a goal — it is a slogan. Every commit references one of those goals. Every shipped piece of work gets scored for business impact against the goal it was attached to. The team is ranked by weighted impact, not by volume. Ten commits that move a goal outrank 50 that do not. What does clean code even mean in an AI era? Not beautiful. Not idiomatic. Clean means the decisions, context and dependencies are explicit and obvious. Clean means an agent or a new engineer can read your code and understand why it exists, not just what it does. The aspects of code quality people used to obsess over are getting eaten by better tooling every quarter. The part that matters is whether the work moved something that mattered.
“Every shipped piece of work gets scored for business impact against the goal it was attached to.”
Name one recent AI/automation shipped and its impact on the team or business.
We built a research engine that runs adversarial deep research on any question our team asks, any time of day. A person scopes the question. The system fires multiple frontier models at it. Each one carries a different learned map of the world. Point those maps at the same surface, and you get a resolution no single model can produce. Then a dedicated adversary attacks the consensus to expose failure modes and blind spots the helpful models gloss over. What used to take a senior researcher a full week of manual work now runs in under 10 minutes and comes back with an evidence-graded answer we can act on. The team uses it for category bets, pricing calls, vetting partners and architecture choices. It is not a chat wrapper. It is how we decide what to build and what to stop building. Our own engineers built it on the same setup they use for everything else.
Parsec Automation’s manufacturing operations management software is designed to optimize efficiency, quality and compliance at manufacturing plants.
What’s your rule for fast, safe releases — and what KPI proves it works?
Humans in the loop and good code review. We can generate a lot of code very quickly these days, having a process for reviewing PRs and directing the energy into the right PRs. Another underrated aspect for us is the team dynamic. We have a close-knit development and quality team that maintains very frequent interactions. Knowing that your actions in review — good or bad — have a real effect on your colleagues is a huge underlying motivation for collective responsibility and diligence. Our KPIs are customer satisfaction and low occurrences of critical issues.
“Knowing that your actions in review — good or bad — have a real effect on your colleagues is a huge underlying motivation for collective responsibility and diligence.”
Which standard or metric defines “quality” in your stack?
The most important “standard” or aspect of our quality process is risk assessment. Like most enterprise software, our platform is large and contains a lot of code. We of course track and strive to improve metrics like code coverage and number of unit tests and automated test cases, but simply having “coverage” does not mean that the tests are good. We rely on a process that asks experienced developers/managers to assess the level of testing for each feature or defect fix. This allows us to direct the bulk of the QA energy to the places that really need it, rather than chasing metrics that might look good but hide underlying problems.
Name one recent AI/automation shipped and its impact on the team or business.
We recently shipped TrakSYS IQ Assistant, an AI user interface that allows users to interactively explore the complex data captured in our manufacturing execution system platform. Our implementation is clever in that it does not require large amounts of sensitive manufacturing data to be shipped out-of-network, and it allows a new class of users self-service access to information that was previously difficult to access. This solves several challenges in MES scenarios, increasing the speed to find answers and exposing the complex and useful data we capture to a larger audience.
Sysco LABS is the innovation arm of foodservice provider Sysco. Its teams develop e-commerce solutions that help the company’s customers browse products and place orders, track orders in real time and gain insight into their accounts.
What’s your rule for fast, safe releases—and what KPI proves it works?
Talking releases and KPIs means largely focusing on Sysco Shop, our flagship e-commerce platform, where our mantra is, “Have the application up.”
To get there, our rule is simple: Release fast but never compromise our customers’ confidence. With that as our philosophy, the KPIs that prove it are the change failure rate paired with mean time to recovery.
We aim to deploy frequently, keep failure rates consistently low, and quickly restore services to maintain resiliency — clear signals that velocity and quality are aligned. Speed comes with discipline, not shortcuts. We keep releases small, automated and observable. Changes go into automated testing pipelines that show clear ownership, security and observability. If it can’t be safely rolled back or clearly monitored, it’s not ready to ship.
“We aim to deploy frequently, keep failure rates consistently low, and quickly restore services to maintain resiliency — clear signals that velocity and quality are aligned.”
Deploying so often can cause incidents. We have so many day-to-day microservices powering a huge number of orders through Sysco Shop. Things can fail, right?
Our customer-first mindset targets objectives on latency, error rates and saturation at 99.9 percent, meaning that 99.9 percent of the time not only do you feel no disruption, you won’t even notice a change in speed.
Which standard or metric defines “quality” in your stack?
Quality and success mean customer and sales colleague satisfaction, and we can measure this through our net promoter score as well as through successful, increasing adoption.
We measure adoption through how many orders get through our ecosystem and how much volume and revenue comes through Sysco Shop. That’s why we keep 99.9 percent as our benchmark. If that service level objective drops, then something is wrong; whether it’s latency or errors, users unable to complete the flow means ordering isn’t happening. If we don’t have ordering happening, then we don’t have increasing adoption. Without our focus on resiliency, then we’re essentially any other app, right?
We’ve seen adoption steadily increase over the years. Sysco Shop users have grown, and an increasing share of Sysco’s revenue comes through e-commerce. We believe that comes from an internal measure of quality: cross-team collaboration. Teams that work more closely together and are more closely aligned build stronger, faster and more resilient products — together. As we focus on — and track data around — team collaboration and satisfaction, we also see it grow higher and higher.
Name one recent AI/automation shipped and its impact on the team or business.
AI360 and its newly launched agentic solution is a dynamic toolkit that helps sales colleagues plan more effectively, with guidance on customer actions, curated product and feature recommendations tailored to customer needs, and risk alerts that enable sales teams to focus more on customers — driving clearer, more targeted approaches to opportunities.
This sits on the foundation of our award-winning SAGE agentic ecosystem, which has matured so much over the last year that today we’re able to make incredible strides in deploying high-trust enterprise grade agents and significantly growing the scale and scope of these colleague- and customer-supporting tools.
Emerging aspects across the agentic era show a paradigm shift in how we evolve software development. As we eliminate mundane tasks, we free up time for engineers to be orchestrators, builders and closer collaborators with product teams, so we can build great products together and ship them out faster. It’s been inspiring from a platform engineering perspective, because we’re empowering engineering teams to build at scale, establishing the foundation layer for agents, and it’s exciting to see how far we can take it.
Analytics8 is a data analytics consultancy that offers services such as data governance, cloud architecture optimization and generative AI implementation.
What’s your rule for fast, safe releases — and what KPI proves it works?
Our rule is to use AI to accelerate execution, not to replace judgment. At Analytics8, we’ve operationalized that principle through Accelr8, our Intelligence Hub that brings together best practice playbooks, a code library, templates and agent orchestration. Heavy lifting happens within Accelr8 itself, rather than relying on individual consultants to carry all the institutional context in every prompt. That reduces the contextual risk of human orchestration and lets our teams move faster.
Tools like Claude Code and Codex have significantly sped up how quickly we can engineer pipelines, model data, and architect environments, but that only works when paired with more rigor before and after the build. We spend more time defining and scoping the problem and reviewing, testing and refining the output once it is generated. We never treat the first pass from an agent as “done.” The biggest benefit is that AI compresses the middle of the process, which gives us more room for thoughtful planning and quality control while still reducing overall delivery time. Margin improvement is a strong indicator of whether an AI approach increases velocity in a sustainable way.
“Margin improvement is a strong indicator of whether an AI approach increases velocity in a sustainable way.”
Which standard or metric defines “quality” in your stack?
For us, quality is defined less by a single metric and more by whether what we deliver is scalable, maintainable and fit for purpose. In practice, that means strong problem definition, clear architectural thinking, consistent coding standards and thorough human review before anything goes live. Especially as AI becomes a bigger part of the development process, we are focused on making sure speed does not come at the expense of clarity, reliability or long-term usability. Our standard is not simply speed of delivery, but high-quality solutions we would be confident in maintaining over time or even handing off completely to our clients once we’re done.
Name one recent AI/automation shipped and its impact on the team or business.
One recent example is RADAR, our rapid assessment for data analytics readiness product. RADAR helps us quickly assess a client’s environment across platforms like Databricks, Snowflake and SQL Server using MCP servers, then score it from zero to 100 across areas like security and access controls, job orchestration, coding standards and AI readiness. That allows us to evaluate environments and generate meaningful recommendations in a matter of hours rather than days or weeks of manual analysis. We combine RADAR with human interviews to capture the business and technical context that automation alone cannot provide, but the product has significantly increased the speed at which we can deliver insight and value to clients.
