The Free Ride Is Over: AI Economics Is Now Your Most Important Strategy Decision

Three years of subsidised tokens taught us to build without counting the cost. The bills are landing now.

A
Arpy Dragffy · · 9 min read
Editorial photograph: The Free Ride Is Over: AI Economics Is Now Your Most Important Strategy Decision
Photo: Generated via Flux 1.1 Pro
Overview
  • OpenAI will lose an estimated $14 billion in 2026, spending $2.25 for every dollar earned. Anthropic has spent over $10 billion on inference and training against $5 billion in lifetime revenue. The subsidised era is ending by financial necessity, not choice.
  • Adobe just announced outcome-based pricing for its AI agent suite — charging per campaign completed, not per token burned. Salesforce followed with 'agentic work units.' The industry is signalling that usage-based pricing is a transitional model, not the destination.
  • Token prices dropped 280x in two years — but enterprise AI bills rose 320%. Agentic workflows, RAG pipelines, and always-on agents are consuming compute at rates no one planned for. The economics break even when tokens get cheap.
  • The executives who survive the next 18 months will be the ones who answered one question before signing another enterprise AI contract: what are we willing to pay per unit of value delivered, and what model achieves that?

Adobe's head of customer experience said it plainly at Adobe Summit last week: "Tokens don't equate to value." He was announcing outcome-based pricing for Adobe's new AI agent suite — charging customers per campaign completed, per interaction resolved, per business task finished. Not per token burned.

I want to dwell on that sentence for a moment, because it is the most honest thing a major software executive has said about AI pricing in three years. Tokens are a unit of compute. They have never been a unit of value. The industry priced them as if they were because it was the easiest abstraction available at launch — and because VC subsidies made the real economics someone else's problem.

That era is ending. Fast.

The numbers behind the free ride

The economics of the current AI landscape are not sustainable, and the companies involved are not hiding it.

OpenAI is projected to lose $14 billion in 2026, with inference costs alone hitting $14.1 billion — up from $8.4 billion in 2025. The company spent $2.25 for every dollar earned last year. Anthropic's CFO disclosed in a March 2026 legal filing that the company has spent "over $10 billion on inference and training combined" while generating "exceeding $5 billion in cumulative lifetime revenue." Two dollars out for every dollar in.

This is not a secret. It is published in financial filings, reported in earnings coverage, and discussed openly by the CEOs of these companies. What has been missing is the honest reckoning with what it means for everyone downstream — the enterprises signing contracts, the startups building on these APIs, the developers choosing which models to use for which tasks.

The reckoning arrived this month.

GitHub paused new signups for Copilot Pro, Pro+, and Student plans as week-over-week costs running the service nearly doubled since January 2026. Microsoft is shifting GitHub Copilot from request-based billing to token-based billing — a move that will make usage more expensive and more visible simultaneously. On April 21, Anthropic quietly updated its pricing page to remove Claude Code access from the $20/month Pro plan. The changes were reversed within hours after developers noticed, with Anthropic's Head of Growth clarifying it was a test on roughly 2% of new signups. But Boris Cherny, Anthropic's Head of Claude Code, had already said the quiet part out loud in April: subscriptions "weren't built for the usage patterns of these third-party tools."

The plan was never built for how people actually use it. That is the confession embedded in that sentence.

Adobe and Salesforce are already moving to what comes next

The transition has a shape. Adobe's CX Enterprise announcement — outcome-based pricing tied to the number of AI campaigns completed, the number of interactions resolved — is not a marketing experiment. Salesforce rolled out "agentic work units" on the same logic: track the work the AI completes, not the compute it consumes.

The destination is results-based pricing. The question is what happens in the messy middle — the 18-month window between the subsidised usage-based model that launched the industry and the outcome-based model that will define the next phase.

In that window, executives who haven't built a clear north star around AI economics are going to make very expensive decisions.

What I'm actually doing — and why

I want to be specific here, because the gap between how people talk about AI costs and how they actually experience them is enormous.

Lovable was my first entry to building with AI and it simply was not cost-effective because of the limited tokens and inconsistent outputs. I now use Claude Code constantly. It is, without qualification, the most productive tool I have used for building anything —from software, to Garmin health automations, to replacing all of my content management systems. The experience of working with it is genuinely different from anything that came before. I would recommend it to anyone who builds.

And the economics of using Claude for everything are unsustainable at the rates I work. I am not alone in noticing. Scroll through any developer community right now and the dominant conversation has shifted from "look what I built with AI" to "how do I reduce token burn" and "here's my workflow to stay under the limit." Power users are waking up to the real cost of AI — and the volume of posts about optimising prompts, caching strategies, and cheaper model routing is the clearest demand signal that the subsidised pricing era has run its course.

At Max 5x ($100/month), developers reported sessions depleting five-hour usage windows in as little as 19 minutes during the March quota crisis. One developer summarised it clearly: using up their entire monthly Max 5 allotment in a single hour of real work. The Max 20x plan at $200/month is better — but if you're running Opus on complex architectural tasks across a full workday, the API pay-as-you-go economics can actually beat the subscription, depending on your usage pattern.

So I've started making decisions the way I make decisions about any other infrastructure: by outcome and cost, not by brand loyalty or habit.

For image generation, I've built my own pipeline on Replicate — pay-as-you-go GPU compute, no minimums, composable across models. Nano Banana (Google's Gemini Flash Image) is free but extremely slow for Google Workspace users. If you elect to use their API it delivers strong production-scale results at $0.045 per 512px image and $0.151 at 4K. For volume output where I need consistency across many images, the economics are hard to beat. For work where text rendering and creative control matter more — mockups, specific typographic treatments — Ideogram 3.0 at $0.03–$0.09 per image gives me more predictable output quality. Neither is perfect for every use case, which is why the pipeline exists.

For security-intensive tasks — anything involving proprietary code, client data, or internal tooling I can't send to a cloud endpoint — I'm using local models. Zero inference cost, zero data exposure, acceptable quality for many tasks that don't require frontier performance.

My decisions are no longer primarily about capability. They're about the intersection of economics, results, and what I'm willing to accept as "good enough" for a given task. That mental model shift is one I think every executive building on AI needs to make.

The paradox that breaks usage-based pricing

Here is the uncomfortable arithmetic that the AI industry is not advertising.

Token prices dropped 280x in two years. GPT-4 launched in March 2023 at $30 per million input tokens. GPT-4.1 Nano runs today at $0.10 per million — a 99.7% decline. Claude Sonnet at $3 per million input tokens offers performance that would have required $30+ models two years ago. The price of intelligence, measured in tokens, has collapsed.

And yet enterprise AI bills rose 320% over the same period.

The reason: agentic workflows trigger 10–20 LLM calls per user task. RAG architectures inflate context windows three to five times. Always-on monitoring agents burn compute continuously. What looks cheap per token looks extremely expensive per workflow, per day, per employee, per month.

DeepSeek V3 runs at $0.14 per million input tokens and $0.28 per million output — compared to Sonnet 4.6 at $3/$15 and Opus 4.7 at $5/$25. That is a 20x cost advantage on comparable tasks for a large portion of enterprise use cases. The capability gap between the cheapest adequate model and the most expensive frontier model is closing faster than the price gap is. Routing 80% of routine inference to cost-optimized models while reserving frontier models for genuinely complex tasks reduces spend by 60–80% with minimal quality impact. That is not a future optimisation. That is an available decision right now.

The companies that will get hurt are the ones that locked into usage-based contracts at 2025 pricing, built their product architecture around a single provider's models, and are about to discover that 30–50% API price increases are coming as vendors move toward sustainable unit economics.

The north star question

The AI industry spent three years asking one question: can we do this with AI? The answer, across almost every domain, is yes. That question is no longer interesting.

The question that matters for the next three years is: what are we willing to pay per unit of value delivered, and what model, architecture, and workflow achieves that at scale?

This is not a technical question. It is a strategy question — and it belongs in the boardroom, not in a sprint planning session with an engineering team.

For startup founders specifically: if your product's unit economics only work because tokens are artificially cheap, you do not have a product. You have a dependency. The companies subsidising your infrastructure are losing billions of dollars doing it, and they will eventually stop. The founders who build with that reality priced in from day one are the ones who will survive the normalisation.

For enterprise executives: the era of "let's experiment and see what the AI budget looks like in a year" is over. The experiment is over. The results are in. The question now is which experiments produced real business value at defensible economics, and which ones were possible only because someone else was absorbing the cost.

What I'm watching

The interesting companies of the next 18 months are not going to be the ones who bought the most tokens. They are going to be the ones who built the most efficient path between a specific outcome and the compute required to reach it — who knew exactly which model to call for which task, who had a clear answer to what they were willing to pay for a unit of value, and who built their product architecture around that answer before the subsidy ended.

Adobe said tokens don't equate to value. They are pricing accordingly. The rest of the industry is about to be forced to follow — and most organisations are not ready.

Building a north star for your AI product is not a product decision. It is a survival decision. The free ride is over. That's why as a consultant I work with startups and established orgs to build highly-refined definitions of what AI products needs to deliver and optimizing the context environments. That's the only way that builders can delivering expanding capabilities while also keeping the financials in check.

Frequently asked questions

Why are AI bills rising if token prices keep falling?

Token prices have dropped by as much as 99.7% since 2023. But enterprise AI bills have risen because agentic workflows trigger 10–20 LLM calls per user task, RAG architectures inflate context windows significantly, and always-on AI agents consume compute continuously. The cost per token is cheap; the cost per workflow, per employee, per month is not. Routing routine tasks to cheaper models and reserving frontier models for complex work can reduce spend by 60–80%.

What is outcome-based AI pricing?

Outcome-based pricing charges customers for the business results an AI system produces rather than the compute it consumes. Adobe's CX Enterprise charges per AI campaign completed or per customer interaction resolved. Salesforce's "agentic work units" charge per unit of work the AI completes. The shift moves billing from a measure of compute to a measure of value — addressing the core problem that token volume has no direct relationship to business outcome.

Should I use Claude Code on the Pro or Max plan?

It depends on how intensively you work. At $20/month, Pro includes Claude Code but was briefly tested for removal in April 2026, signalling usage pressure at that tier. Max 5x ($100/month) can deplete in a single intensive session for heavy users. Max 20x ($200/month) is better for full-time use. For developers who code three or more full days per week using Opus on complex tasks, the API pay-as-you-go option can sometimes be more economical. Prompt caching — where over 90% of tokens in a typical session are cached reads at 10% of standard price — changes the math significantly.

What is the cheapest AI image generation API in 2026?

Nano Banana (Google's Gemini Flash Image) runs at $0.045 per 512px image and $0.151 per 4K image, with a 50% discount for batch processing. Ideogram 3.0's API starts at $0.03 per image at Turbo tier and $0.09 at Quality tier. Replicate offers pay-as-you-go GPU compute for running open-source image models, with no monthly minimums. The cheapest option per image is not always the best option — control, consistency, and text rendering vary significantly across models, which is why custom pipelines combining multiple models are increasingly common for production use cases.

Is usage-based AI pricing sustainable long-term?

Competitively, it faces serious pressure. DeepSeek V3 runs at roughly $0.14 per million input tokens versus $3 for comparable US models — a 20x price gap on many tasks. Open-source and local models eliminate inference costs entirely for organisations with the infrastructure to run them. As the capability gap between frontier and cost-optimised models narrows, the economic argument for premium usage-based pricing weakens. The likely long-term equilibrium is outcome-based or hybrid pricing tied to business value — a direction Adobe and Salesforce are already moving.

A
Arpy Dragffy

Founder, PH1 Research · Co-host, Product Impact Podcast

View all articles →

Hosted by Arpy Dragffy and Brittany Hobbs. Arpy runs PH1 Research, a product adoption research firm, and leads AI Value Acceleration, enterprise AI consulting.

Get AI product impact news weekly

Subscribe

Latest Episodes

All episodes

Related

6