Product.ai
Product.ai Innovation & Technology Culture
Product.ai Employee Perspectives
What’s your rule for fast, safe releases — and what KPI proves it works?
Our engineering runs through two surfaces: an AI coding agent and GitHub. The spec comes first. Engineers spend most of their time on the shape of what we are about to build — data models, contracts and invariants. The code is generated against that spec, then it has to clear the automated gates before it merges, including the compiler, the type checker, the linter, and the full test suite. If any one fails, the build fails. We also run code review, not as the last line of defense, but to verify the code and the tests themselves are sound. Weekly, we do a deeper review of the codebase to catch anything that crept in. That is expensive, but it is how quality holds over time. We do not wait for QA teams. We use agents to red-team.
Our metric is how much earlier a feature ships than it was originally planned. A roadmap item scheduled for quarter three, which ships in April, pulled 170 days of roadmap forward. That is the number that tells us this works. Velocity without that number is noise.
Which standard or metric defines “quality” in your stack?
Quality is not test coverage. Quality is how much the work moved something real for users. Every goal we set has a falsifiable test attached to it. Traffic crossed the baseline and was held for 30 days. The model stayed fast under load. Users stayed longer. If you cannot state it that way, it is not a goal — it is a slogan. Every commit references one of those goals. Every shipped piece of work gets scored for business impact against the goal it was attached to. The team is ranked by weighted impact, not by volume. Ten commits that move a goal outrank 50 that do not. What does clean code even mean in an AI era? Not beautiful. Not idiomatic. Clean means the decisions, context and dependencies are explicit and obvious. Clean means an agent or a new engineer can read your code and understand why it exists, not just what it does. The aspects of code quality people used to obsess over are getting eaten by better tooling every quarter. The part that matters is whether the work moved something that mattered.
Name one recent AI/automation shipped and its impact on the team or business.
We built a research engine that runs adversarial deep research on any question our team asks, any time of day. A person scopes the question. The system fires multiple frontier models at it. Each one carries a different learned map of the world. Point those maps at the same surface, and you get a resolution no single model can produce. Then a dedicated adversary attacks the consensus to expose failure modes and blind spots the helpful models gloss over. What used to take a senior researcher a full week of manual work now runs in under 10 minutes and comes back with an evidence-graded answer we can act on. The team uses it for category bets, pricing calls, vetting partners and architecture choices. It is not a chat wrapper. It is how we decide what to build and what to stop building. Our own engineers built it on the same setup they use for everything else.
















