In early 2024, I sat in a product review that felt like a magic show. My team had integrated a new large language model (LLM) into our workflow automation platform. The demo was flawless. You typed a vague instruction, and the AI generated a complex, multi-step workflow in seconds.
The room erupted in applause. The engineering velocity was incredible. The user experience was delightful. The Net Promoter Score (NPS) projections were through the roof.
Then I asked the one question that ruins every AI demo.
“What is the margin profile of a heavy user?”
The room went silent. We ran the numbers on the whiteboard. Between the token costs for the prompt, the retrieval augmented generation (RAG) lookups and the output generation, a single interaction cost us 14 cents. Our pricing model was a flat monthly subscription. If a user engaged with this “magic” feature more than three times a day, they became unprofitable.
We had built a feature that users would love, but we had broken the business model. We were effectively subsidizing our customers’ productivity with our own capital.
This is the trap of the current AI boom. We’re judging success by technical capability rather than economic viability. This is not a product management problem. It is a capital allocation and governance problem. To survive the shift from “growth at all costs” to “efficient growth,” product leaders need a new framework. We need to stop optimizing for magic and start optimizing for margin.
What Is the Evergreen Ratio?
The Evergreen Ratio is a financial and operational metric used to measure the economic viability of AI products. It represents the proportion of AI interactions served from reusable assets (cached or pre-computed responses) versus those requiring real-time inference from expensive models.
The Formula
Evergreen Ratio = (Cached or Pre-Computed Responses / Total AI Interactions)
Key Insights
-
Target Range: A profitable AI feature typically sits between 60 and 80 percent.
-
0 Percent Ratio: Indicates every query is generated in real-time, leading to maximum cost volatility and low margins.
-
Purpose: It acts as a financial control system to transition AI from growth at all costs to efficient growth by reducing the marginal cost of intelligence.
The AI Pricing Paradox
The most dangerous trap in AI product management right now is the disconnect between how we pay for intelligence and how we charge for it. In the SaaS era, we got addicted to flat-rate subscriptions because our costs were flat. Adding a new user cost us next to nothing in server load.
AI breaks that model. AI costs are variable and consumption-based. Every time a user hits enter, the meter spins. If an AI feature’s marginal cost rises with usage, it cannot be sold with flat pricing without subsidy. If your costs are variable, your revenue cannot be fixed.
I learned this the hard way with a feature we launched that allowed users to chat with their data. We bundled it into our enterprise tier at no extra cost, viewing it as a retention lever. It worked. Retention went up. But then we noticed a cohort of power users who were generating thousands of queries a month.
When we audited the unit economics, we found that our top 5 percent of users were consuming 40 percent of our total compute budget. We were charging them $50 a month, but they were costing us $120 a month in inference fees. We had accidentally built a business model where our best customers were our biggest liability.
To fix this, we had to introduce a fair use limit and a credit system. The conversation with sales was painful, but it saved the P and L. We shifted from all you can eat to a la carte.
The High Cost of Fresh Intelligence
The root of the problem is that we treat AI like software, but it behaves like a service. In traditional software, the marginal cost of a user action is near zero. In generative AI, every action has a variable cost of goods sold (COGS).
The most expensive part of this cost structure is freshness. When we ask an LLM to reason through a problem from scratch, we’re paying for premium compute. This is like hiring someone with a doctorate to answer a telephone call. It works, but it’s economically inefficient.
In my experience auditing SaaS portfolios, I’ve found that nearly 80 percent of queries are repetitive. Users ask the same questions about documentation. They request similar summaries of meeting notes. They generate variations of the same email code.
Yet, most product architectures treat every query as unique. They send every prompt to the most expensive model available, paying full price for an answer that has likely been generated a thousand times before. This isn’t innovation. It’s waste.
The Evergreen Ratio
To fix this, I developed a metric I call the Evergreen Ratio, which is a measure of how much AI work is served from reusable assets versus real-time inference. It determines whether an AI feature scales profitably or collapses under variable cost. At scale, this ratio functions as a financial control system, not a performance metric.
The formula is simple.
If your ratio is 0 percent, you’re generating everything in real-time. You’re exposed to maximum volatility and minimum margin. If your ratio is 100 percent, you aren't using AI; you're using a database. The sweet spot for a profitable AI feature usually sits between 60 and 80 percent.
We applied this to the workflow automation tool. We analyzed the logs and realized that the “magic” workflow generation was mostly producing standard patterns. We didn't need to generate them from scratch every time.
We architected a caching layer between the user and the LLM. When a prompt came in, we first checked if a semantically similar request had been answered before. If it had, we served the pre-generated template. This cost us fractions of a penny. Only if the request was truly novel did we route it to the expensive model.
From Architecture to Accountability
Implementing this structure requires operational discipline. I recommend every product team perform a monthly token audit. This is a line-item review of prompt inputs and outputs to identify where you’re wasting expensive compute on repetitive tasks.
Sit down with your engineering lead and pull the logs of your top 100 most expensive prompts. Look at the inputs. Are users asking the same things? Look at the outputs. Are the answers unique, or are they variations on a theme?
In one audit, we found that a document summary feature was rereading the entire document every time a new user opened it, processing a 50-page PDF hundreds of times. We changed the architecture to summarize the document once upon upload and store the summary as a static asset. That single change reduced our API bill for that feature by 90 percent overnight.
You don't need a doctorate in machine learning to find these efficiencies. You just need the discipline to look at the bill.
Designing for Liquidity
Implementing the Evergreen Ratio requires a shift in design thinking. We have to stop designing for infinite possibility and start designing for guided constraints.
In the previous era, a blank text box was the ultimate user interface. In the AI profit era, the blank text box is a liability. It invites infinite variability, which drives up token costs and lowers result quality.
We replaced our open text field with a smart selector that guided users toward pre-built templates. The AI was still there to customize the edges, but the core of the value was delivered via static, low-cost assets. The user experience actually improved because the tool felt faster and more reliable.
By constraining the input, we controlled the economics. We moved our Evergreen Ratio from 5 percent to 75 percent. Our cost per interaction dropped from 14 cents to under two cents. The feature went from a loss leader to a margin contributor.
The Profit Pivot
The difference between a tech demo and a business is unit economics. Building something cool is easy if you ignore the bill. It is much harder, and much more valuable, to build something cool that scales profitably.
As you look at your roadmap for the next quarter, audit your AI initiatives. Ask yourself if you are paying for fresh intelligence when cached wisdom would suffice. If you can shift your architecture from dynamic generation to asset reuse, you don't just save money. You build a product that can survive the market. The specific models will change, but variable-cost intelligence will remain an economic constraint.
