Intercom charges $0.99 per resolution. Chargeflow takes 25% of recovered chargebacks. Sierra AI crossed $150M in ARR selling outcomes to enterprise brands.
Outcome-based pricing is the dominant model for AI agents in customer support. The thesis is simple: the vendor gets paid when the buyer gets value. Perfect alignment.
Except one question determines whether the model works or breaks: what counts as an outcome?
The companies getting it right chose outcomes that are binary and measurable. Chargeback won or lost. Ticket resolved or escalated. The companies struggling chose outcomes that are subjective, opaque, or defined by the vendor who profits from counting more of them.
Last Updated: April 2026
The Binary Test
Outcome pricing works when the outcome passes a two-part test. First, it is binary. It happened or it did not. Second, both sides can verify it independently.
Chargeflow passes cleanly. A chargeback is either won or lost. The credit card network decides, not Chargeflow. The merchant can verify the result in their payment processor dashboard. Chargeflow takes 25% of recovered funds. If the chargeback is lost, the merchant pays nothing. The incentive structure is airtight because the outcome is a fact, not an interpretation.
Sierra negotiates outcomes per contract. A "resolved support conversation" is defined in writing before the deal closes. A "saved cancellation" means the customer stayed. A "revenue event" means money moved. At $150K+ annual commitments, both sides invest in getting the definition right. The outcome is negotiated, but it is still binary once defined.
Intercom's Fin is where the line blurs. A "resolution" can be hard (the customer confirms) or soft (no follow-up within 24 hours). The hard resolution is binary. The soft resolution is an assumption. Silence within an arbitrary window is treated as success.
That assumption carries a cost.
When Silence Passes for Success
One support team followed up with customers who received soft resolutions from Fin. Only 62% said their issue was actually resolved. A 38% false-positive rate. Every false positive was billed at the full $0.99.
Silence can mean the answer worked. It can also mean the customer gave up, called phone support instead, or tried the solution 25 hours later and found it broken. All four count as billable resolutions under Intercom's 24-hour window.
Community posts have flagged another edge case: Fin counts a resolution even when a human agent takes over the conversation, as long as the AI responded first. The AI's contribution might have been a generic greeting. The resolution fee still applies.
The outcome is no longer binary. It is probabilistic. And the entity assigning the probability is the same entity collecting the fee.
With token-based billing, you can count tokens yourself. With seats, you know how many you have. Resolution billing means trusting the vendor's counting methodology with no independent verification path.
Five Vendors, Five Definitions
The word "resolution" sounds like a standard. It is not.
| Provider | Price | What Counts | Verification | Risk |
|---|---|---|---|---|
| Intercom (Fin) | $0.99/resolution | Customer confirms OR 24h silence | 24-hour window | Pays for abandoned conversations |
| Zendesk | $1.50-$2.00/resolution | AI analyzes conversation for relevance | 72-hour AI evaluation | Black-box classification |
| HubSpot (Breeze) | $0.50/resolved conversation | "Conversation successfully completed" | Not disclosed | Lowest price, least transparency |
| Salesforce (Agentforce) | $2/conversation or $0.10/action | Completed conversation or discrete action | 24h inactivity | Simple query costs the same as complex case |
| Sierra | Custom | Contractually defined per deal | Agreed per contract | Requires $150K+ commitment |
Price points vary 4x for superficially similar products. HubSpot charges $0.50. Zendesk charges up to $2.00. Both resolve customer support tickets with AI. The difference is not quality. It is definition breadth and verification rigor.
Zendesk's 72-hour window and AI-based verification is more conservative than Intercom's 24-hour silence rule. But the AI that decides whether a resolution was "satisfactory" is a black box. You cannot audit the classification logic. You can only compare your internal satisfaction scores against the invoice and hope they correlate.
Salesforce's trajectory tells the clearest story. The original $2 per conversation model drew criticism because a simple order status check cost the same as a complex multi-step case. Salesforce responded by introducing $0.10 per action pricing. When the market leader changes its billing unit within 18 months, the model is still finding its footing.
The Risk Inversion
Here is the part that matters for vendors building on outcome pricing.
The model shifts risk from the buyer to the vendor. The buyer pays only when value is delivered. That is the selling point. It is also the structural danger.
The vendor absorbs the cost of every failed attempt. If 30% of AI resolutions require human escalation, the vendor eats the inference cost on those 30% with zero revenue. The model inputs, the compute, the API calls to the underlying LLM: all consumed, none recovered.
For the buyer, this is elegant. Pay for results, not effort. For the vendor, every failed resolution is a sunk cost that does not appear on the invoice but absolutely appears on the margin report.
The margin dynamics compound. Better AI resolves more tickets, which is good. But better AI also attempts more complex tickets, which increases the failure rate on hard cases. The resolution rate improves on average while the cost per failed attempt increases at the tail. The aggregate number looks healthy. The per-outcome unit economics tell a different story.
One support team improved their help center content, making Fin more effective. Resolution rate jumped from 40% to 65%. Costs rose over 60%. The AI got better. The invoice got bigger. This is the same quality-cost inversion that hit GitHub Copilot: product improvement drives cost increase.
Forecasting in the Dark
Traditional SaaS costs are predictable. 50 seats at $100/month equals $5,000/month, every month.
Outcome-based costs depend on customer behavior, issue complexity, AI effectiveness, and seasonality. One founder's Intercom bill went from $200/month to $1,400 during a product launch. That is a 7x swing from a single external event. The cost spike is not a bug. It is the model working as designed.
This forecasting gap is the same dynamic that drives enterprise buyers to demand forecastability. CFOs need a number for the quarterly plan. "Somewhere between $3,000 and $12,000 depending on how good our knowledge base gets" is not a number.
Resolution costs also compound with growth. More customers means more interactions means more resolutions. Unlike infrastructure, which has economies of scale, there are no volume discounts on resolutions at most providers. Zendesk offers committed rates ($1.50 vs $2.00), but the discount requires pre-purchasing volume, which only works if you can forecast volume accurately. The model creates a circular problem.
For vendors, the forecasting problem is worse. The buyer's unpredictable volume is at least bounded by their customer base. The vendor's cost per outcome is bounded by nothing. A model price increase from the upstream LLM provider, a shift in ticket complexity, a regression in the knowledge base: any of these can move the cost-per-outcome number without the vendor changing a single line of code.
The Dividing Line
Outcome pricing is the future for AI products that deliver measurable results. The alignment between vendor revenue and buyer value is real. No other model ties payment so directly to impact.
But the model has a prerequisite that most vendors skip: cost-per-outcome visibility.
The companies succeeding with outcome pricing (Sierra, Chargeflow) share two traits. Their outcomes are binary and independently verifiable. And they know, precisely, what each outcome costs them to deliver. Sierra negotiates the definition. Chargeflow lets the card network decide. Neither leaves the outcome definition to a probabilistic model controlled by the party collecting the fee.
The vendors struggling (or pivoting, in Salesforce's case) share the opposite trait. The outcome is fuzzy. The measurement is opaque. And nobody on the vendor side can answer the most important question in the business: what does it cost us to deliver one successful outcome?
Without that number, outcome pricing is a promise without a foundation. You are guaranteeing results to the buyer while flying blind on what those results cost you. When the margin report arrives, the gap between "resolutions billed" and "resolutions that cost less than $0.99 to deliver" is the number that determines whether the model is sustainable or slowly bleeding cash.
Building on outcome pricing? Bear Lumen connects inference costs to individual outcomes, so you can see your true cost per resolution before your margin report does.