A simple API call and a complex multi-step agent workflow cost wildly different amounts. Most AI companies charge the same for both.
OpenAI's GPT-5.2 bills $1.75 per million input tokens. Anthropic's Claude Sonnet 4.6 bills $3 per million. These numbers tell a product manager nothing about what a single user task will cost. "Change this CSS class" and "build a feature flag system from scratch" both look like one prompt in a coding assistant. One costs fractions of a cent. The other burns through dollars. The meter runs either way.
This is the parking meter model. The clock starts, the cost ticks up, and the user finds out what they owe when the session ends. Uber operated the same way for its first four years. Then it stopped.
Last Updated: April 2026
What Uber Figured Out
In June 2016, Uber replaced its running-meter model with upfront pricing. Passengers saw a fixed price before confirming the ride. The algorithm factored in distance, traffic, and demand, then presented one number. Accept or decline.
Riders converted at higher rates. The anxiety of an open-ended meter disappeared. Surge pricing still ran inside the algorithm. But the user never saw the raw mechanics. By 2024, Uber attributed a significant portion of its $4 billion US profitability improvement to the upfront pricing system.
The insight was not that dynamic pricing was bad. The insight was that exposing raw consumption mechanics to the buyer created friction. A clear quote removed it.
AI products face an identical structural problem. The underlying cost is dynamic: token consumption varies by task complexity, model choice, and context window size. But the user experience of that dynamism does not have to be a running meter.
Three Ways the Parking Meter Breaks
The user cannot estimate cost before committing. Cursor burned through its credibility in mid-2025 when it switched from 500 flat requests per month to a credit pool. Some developers reported their entire monthly allocation disappearing in two or three heavy prompts. CEO Michael Truell published a public apology on July 4, 2025. Users had no way to know what a prompt would cost until the tokens were already consumed.
Enterprise budgets fail. Zylo's 2026 SaaS Management Index found 78% of IT leaders reported unexpected charges from consumption-based AI pricing. Mavvrik's research puts it higher: 85% of companies miss AI cost forecasts by more than 10%. A legal team deploying AI for contract drafting cannot predict in January how complex the cases will be in October. The consumption model assumes the buyer can predict usage patterns for a technology they are still learning to use.
Credits create anxiety, not clarity. Midjourney allocates "fast GPU hours" that deplete at different rates depending on resolution and quality. Jasper counts words. Cursor counts requests. All three translate to the same experience: a depleting balance with no clear relationship between the work requested and the units consumed. Netflix does not bill by minutes of watch time for a reason.
The usage-based pricing trap is well documented. What is less discussed is that the trap is not inherent to usage-based models. It is inherent to usage-based models that expose raw consumption mechanics to the buyer.
The Quiet Part
Token-based pricing feels fair because it is transparent. You pay for what you use. That logic breaks the moment the unit of measurement stops mapping to the unit of value.
A customer support ticket resolved in three messages and a ticket resolved in fifteen both deliver one outcome: a resolved ticket. A contract reviewed in 2,000 tokens and a contract reviewed in 40,000 tokens both deliver one outcome: a reviewed contract. The token count varies by 20x. The value delivered is identical.
Transparency about the wrong unit is worse than abstraction over the right one. The parking meter is honest about how many minutes you parked. It tells you nothing about whether the trip was worth taking.
This is the structural gap. The company that prices per token gives customers full visibility into a metric they cannot control or predict. The company that prices per task gives customers a number they can budget, compare, and approve before the work begins. The second company captures more value on complex tasks, absorbs less risk on simple ones, and gives buyers the forecastability they are demanding.
Who Is Already Moving
The shift is not theoretical.
Intercom's Fin charges $0.99 per resolved ticket. Inference cost per ticket ranges from $0.15 for a straightforward FAQ to $0.85 for a complex multi-turn conversation. Intercom absorbs the variance. The buyer's cost is predictable per unit of value delivered.
Salesforce Agentforce bills $2 per conversation. Whether the conversation requires 500 tokens or 5,000, the price is the same.
Devin uses a hybrid: $20/month base plus $2.25 per "Agent Compute Unit." The ACU is not a token count. It bundles the underlying model calls into a unit the buyer can reason about, the same way Uber's fare abstracts away per-mile and per-minute components.
Each product places an abstraction layer between raw inference costs and the buyer's experience. The underlying billing is still usage-sensitive. The user-facing price is anchored to something they understand: a ticket, a conversation, a compute session.
The products that have not made this shift are the ones generating the margin compression and the enterprise budget complaints that define this pricing era.
What Contextual Pricing Requires
Uber did not abandon dynamic pricing. It added an estimation layer between dynamic cost and the consumer. That layer requires three capabilities most AI billing systems lack.
Cost estimation at the task level. Before presenting a quote, the system needs to predict how much compute a given task will consume. Historical data on similar tasks, awareness of which models will be invoked, understanding of how input complexity maps to token consumption. The unit economics framework covers this in detail: cost attribution must go beyond tokens to capture the full expense of a task.
Real-time budget tracking. Once the user approves a maximum, the system must track spend against that cap during execution. Straightforward for single-model calls. Complex for multi-model routing pipelines where a single request cascades through inference, embeddings, retrieval, and post-processing. Each step has a cost. The sum must stay within the approved budget.
Margin visibility per task. The provider needs to know whether each quoted price is profitable. If the estimate is $10 and actual inference costs $9.50, the margin is 5%. If the next task estimates at $10 but costs $3, the margin is 70%. Without per-task visibility, pricing decisions rest on averages. Averages hide the tasks that lose money, the same power-user dynamic that caused GitHub Copilot to lose $20 per month on heavy users.
Most AI products today lack all three. They know their aggregate model spend. They do not know their cost per customer, per task, or per feature. The providers cannot forecast their own costs. Their buyers cannot forecast theirs.
Where This Lands
The parking meter will persist for API-level products where the buyer is a developer who thinks in tokens. OpenAI's API, Anthropic's API, and Google's Vertex AI will continue billing per million tokens because their buyers operate in those units.
The shift applies everywhere the buyer measures value in outcomes. A CFO does not care how many tokens a contract review consumed. A support lead does not care how many inference calls a ticket resolution required. They care whether the task got done and whether the price was reasonable for the result.
The data-driven pricing framework starts with understanding cost structure at the task level. If you know that a contract review costs $2.40 in inference on average with a standard deviation of $0.80, you can quote $4.50 with confidence and maintain healthy margins. If you do not know your per-task costs, every price is a guess.
The companies building cost estimation and budget approval into their pricing layer will have the flexibility to price on value. The rest will compete on token rates, a race to the bottom where margins compress with every model price drop. That is the difference between pricing like Uber and pricing like a parking meter. One quotes before it charges. The other just runs the clock.
If you are building AI products and need per-task cost visibility, Bear Lumen provides the attribution layer that makes contextual pricing possible. See the cost framework for where to start.