OpenAI & Anthropic are charging us way more than we need

You're being defaulted into the most expensive AI available for work a cheaper model handles just as well. Engineers figured this out. Your organization hasn't been told. The model providers are collecting the difference.

A
Arpy Dragffy · · 11 min read
Editorial photograph: OpenAI & Anthropic are charging us way more than we need
Photo: Generated via Flux 1.1 Pro
Overview
  • AI is getting cheaper every quarter — inference costs dropped ~90% in 18 months. Dozens of specialized, lower-cost models now exist for specific types of work at a fraction of the price of flagship models.
  • But most organizations aren't seeing those savings. 78% exceeded their 2025 AI budget by an average of 47% (Forrester). The reason: your team is defaulted into the most expensive model available, by tools whose business model depends on you staying there.
  • Engineers are quietly paying 15x less per query for the same output — because they choose their model deliberately. Everyone else in your org is running email drafts, meeting summaries, and document searches on a model built for problems orders of magnitude harder.
  • This article breaks down what each model costs, what each tier is actually for, and what leaders and individuals need to do before the bill lands on the CFO's desk.

The narrative around AI cost in May 2026 has two halves, and almost nobody is putting them in the same sentence.

The first half: AI is genuinely getting cheaper. Stanford's 2026 AI Index documents that inference costs at a given capability level have fallen roughly 90% in eighteen months. More capable models — real ones — now cost a fraction of what last year's flagship did. Engineers building AI products are exploiting this aggressively, routing tasks to cheaper specialized models and shipping more for less than ever before.

The second half is your bill. Forrester's Q1 2026 enterprise AI spend tracker found 78% of enterprises overran their 2025 AI budget at an average of 47%. Gartner's 2026 enterprise AI survey finds spend climbing faster than any value metric anyone is tracking. The Token Economics Crisis report from AI Value Acceleration puts the effective monthly cost per power user at $500–$3,000+, up from $50–$150 two years ago.

Both are true at the same time. The cheap AI revolution is real. It just is not reaching most of the people in your organization. Engineers are routing tasks to the cheapest model that works. The rest of your org is being defaulted into the latest frontier model — Claude Opus 4.7, GPT-5-class, Gemini Ultra — for work a model one-fifteenth the price would handle perfectly. The model providers are making a fortune on that gap. You are funding it out of a budget the business is going to come and audit before the year is out, and the people who built the spend are not the ones who will be in the room to explain it.

AI is genuinely getting cheaper — for the people who know how to use it

AI at a given capability level runs roughly 90% cheaper than eighteen months ago. Engineers who build on AI APIs and choose which model handles which task have been capturing that savings for over a year. Specialized models now handle almost every task knowledge workers actually do — summarizing, drafting, classifying, searching. The orchestration pattern to route tasks to the cheapest capable model is documented, proven, and widely deployed inside any well-built Claude Code, Cursor, or MCP-connected agent stack.

Engineers are not the ones being defaulted into the expensive model. Everyone else in your organization is. The vendors are counting on that asymmetry. They built the cheap option, set the expensive one as default, and waited.

The frontier-model upsell

Watch what the model providers actually do in their consumer and prosumer products, not what they say on stage.

ChatGPT Plus and Enterprise defaults users to the latest flagship — GPT-5-class — even when the task is "summarize this email." Claude.ai Pro defaults Pro users to Opus 4.7 for the same kind of work. Microsoft Copilot's premium tiers bundle agent SKUs that route through the most expensive available reasoning model. Cursor's "Max" mode and GitHub Copilot's premium model SKUs follow the same script. The UI default in every major consumer-facing AI product in 2026 is the highest-cost model in the lineup.

This is the iPhone playbook applied to AI, and the AI vendors improved on it in the most predatory way possible. Apple spent fifteen years selling each new iPhone on incremental upgrades that most users did not need — better camera, faster chip, thinner bezel — and built a $400B/year business on the FOMO. With Apple, you paid once every two years and you knew what you paid. With AI, you pay per token, per task, every day, forever — at whatever new top-tier rate the vendor sets — and the bill arrives quietly at the end of the quarter, with no warning and no breakdown of how much of it was for work that did not need the expensive model in the first place.

The economic asymmetry is brutal. Anthropic's extended thinking bills at a 2x–5x output multiplier on top of the Opus base rate. A single complex Opus 4.7 + extended-thinking prompt can cost more than a hundred Haiku calls solving an equivalent task at acceptable quality. The user clicking the dropdown has no signal that this is happening. The vendor has every incentive not to mention it. The CFO finds out one quarter later.

The industry is not selling AI capability. It is selling status anxiety about AI capability — and billing you for that anxiety by the token, every day, forever.

What knowledge workers actually do with AI

The frontier-model upsell works because the conversation about "what AI is for" was captured by demos of the hardest possible tasks. Real enterprise AI usage looks nothing like the demo reel.

The Anthropic Economic Index catalogs millions of real Claude conversations and the top task categories are not "complex multi-step reasoning." They are writing assistance, customer service drafting, information lookup, and summarization. Microsoft's 2025 Work Trend Index surveyed 31,000 knowledge workers: summarizing meetings, drafting emails, internal search, brainstorming. Stanford's 2026 AI Index consolidates the same pattern across every major workforce study.

The eight most common AI tasks for a typical knowledge worker in 2026 — across roles spanning marketing, sales, operations, HR, finance, project management, and customer success — are:

  • Drafting emails and short-form communications
  • Summarizing meetings, transcripts, and long documents
  • Brainstorming ideas, headlines, angles
  • Searching and synthesizing internal documents
  • Drafting first-pass slide outlines, proposals, briefs
  • Translating or rewriting text for tone or audience
  • Answering "what is the status of X" knowledge-base questions
  • Generating structured outputs from unstructured input — extracting fields, tagging records, classifying

Every one of those tasks runs at acceptable quality on Haiku 4.5, Gemini Flash, or GPT-4o mini. None of them require Opus 4.7. None require extended thinking. None require the latest GPT-5-class reasoning mode.

Without extended thinking, Opus 4.7 costs 15x what Haiku 4.5 costs for the same task — both input and output tokens are priced 15x higher. Turn extended thinking on and that gap jumps to 30–60x: the model bills you for every reasoning token it generates before writing the answer, and those tokens stack fast on even a simple request. The output quality difference on these specific tasks ranges from imperceptible to single-digit percentage points. The cost difference is consistent and large. Across a 5,000-person organization where each person hits these tasks twenty times a day, the unnecessary spend runs into the eight figures annually.

Nobody defaults a user to the cheaper option when the expensive one is on by default and the customer isn't watching the meter.

The math, in concrete numbers

A token is roughly three-quarters of a word. Every query your team runs burns tokens going in — your prompt, your document — and coming out — the response. A typical knowledge-worker session runs 5,000–20,000 input tokens and 500–2,000 output tokens. Multiplied across eight daily tasks, twenty times a day, two hundred working days a year:

On Haiku 4.5 at $1 / $5 per million tokens: roughly $30–$90 per user per year.

On Sonnet 4.6 at $3 / $15: roughly $90–$270.

On Opus 4.7 at $15 / $75: roughly $450–$1,350.

On Opus 4.7 with extended thinking (2x–5x output multiplier per Anthropic's published pricing): roughly $900–$6,000+ — for the same tasks.

Cheaper alternatives — Google's Gemini 2.5 Flash, OpenAI's GPT-4o mini, Mistral, DeepSeek, open-weight Qwen and Llama — price at or below the Haiku floor.

Across a 5,000-person enterprise, the gap between "everyone defaulted to Opus + extended thinking" and "model tier matched to task" runs into the tens of millions annually. None of it correlates with measurable business outcome. Forrester's 47% overrun number is not a mystery from this angle — it is a dropdown setting your vendor chose for you.

The bill shocks of 2026

The receipts are now public and they are ugly. Futurism reported in May 2026 that the "dark cloud" — the gap between promised AI efficiency and actual run-rate cost — is now the dominant conversation inside enterprise finance teams. The Register the same week: AWS and Google Cloud customers receiving AI bills materially larger than anything in their forecasts, with no warning from the platform. r/business is tracking per-employee AI spend approaching the salary line. CloudZero's AI Cost Crisis breakdown makes the structural case: token consumption per task is growing faster than per-token cost is falling, and the gap widens every quarter.

I met many people at a conference this month who found out their teams had built dozens of agents that weren't authorized and driving up their bills. The mandate to get everyone to use AI is going to run its course this year because we're not training teams on how to get better results at a reasonable ROI.

A consistent theme across Q1 2026 enterprise audits is the discovery of unsanctioned agents — staff shipping their own Claude Code, Cursor, or n8n workflows against production data on personal API keys or in unmanaged sandboxes. One Fortune 500 we spoke with with discovered 41 unregistered agents inside a single department audit. The aggregated monthly spend on those agents exceeded the budget for the official sanctioned AI program.

There have been incident of models costing $40,000–$200,000 before anyone noticed. The spend often grows faster than anyone is able to observe because the technology is so new the governance so full of holes.

What business leaders must do now

The headers aren't broken. What is broken is the procurement defaults inside your organization, and the answer is not another platform purchase. Here is what changes by Monday.

Override the vendor default everywhere you can. Inside your enterprise tenants of Microsoft Copilot, ChatGPT Enterprise, and any platform you license, set the default model to mid-tier. Make premium models opt-in with a documented reason. The vendor will not do this for you. The vendor profits from every quarter you wait.

Build the context layer once — it is the actual unlock. Context is the unlock — we covered why in April. When your organization builds a proper context layer — connecting AI to your actual systems instead of forcing employees to paste context into every prompt — two things happen at once: output quality improves, and token consumption per task drops. Atlassian's Teamwork Graph, announced at Team '26, is the clearest enterprise example. Agents that can query structured context across Jira, Confluence, and Bitbucket do not have to re-orient on every call. Anthropic's prompt caching puts the math in numbers: reusing cached context cuts repeated-call costs by up to 90%. Skip the context layer and every employee in your org re-pays the context tax on every query, every day, forever — and the vendor keeps the change.

Register every agent and watch the spend. MCP gives you the technical primitive. The shadow-AI usage signal documented in our coerced-adoption analysis is the leading indicator that your sanctioned tool is being routed around. Audit quarterly. Every unregistered agent on a personal API key is a future bill shock and a future security incident waiting for one bad weekend.

Know what your AI investment is actually producing. PH1's Bullseye framework measures AI product impact across power, speed, impact, and joy — connecting spend to outcome. When your board asks what the AI program produced, cost without outcome is what gets programs cut.

Basic procurement hygiene like this is essential. If it doesn't yet exist in your org, take charge of it and become indispensable.

Improve your use of AI while reducing costs

Leaders set the policy. Individual contributors live with it. Here is what every product manager, engineer, marketer, and analyst on your team should be doing differently this quarter — before the bills land on someone's desk and they come looking for who to blame.

Match the model to the task. In plain terms: Haiku handles anything fast and high-volume — email drafts, meeting summaries, document Q&A, internal search, brainstorming, data extraction. If you would hand it to a capable junior and want it back in thirty seconds, Haiku handles it at a fraction of the cost. Sonnet is your mid-level hire: customer-facing content, code generation, nuanced writing, research synthesis, anything where the output needs to be polished and right on the first pass. Opus is the specialist you book for hard problems — complex reasoning with no template, multi-step orchestration where errors compound, high-stakes analysis where being wrong is costly. Treat it like a consultant: not every meeting, and never the default. Anthropic's model selection guidance maps tiers to tasks if you want the full breakdown.

See what your queries actually cost. A user who can see "this query cost you $0.03 on Opus vs. $0.001 on Haiku" makes different choices. If your team's tool exposes per-query cost, look at it before you run the next query. If it doesn't, demand it. The dropdown that quietly upgrades you to the most expensive model is a UI choice the vendor made for their margin, not a quality decision you made for your work.

Use context, not longer prompts. The most effective knowledge worker in 2026 is not the one writing the longest prompts. It is the one whose tools already know who they are, what project they are on, and what answer they need. Until your org has a proper context layer, paste structured context once and reuse it — every minute spent re-explaining yourself to a model is a minute and a token spent twice.

If your product team wants a deeper operational view of what this looks like in practice, Atlassian's VP of AI Jamil on Season 2, Episode 11 of the Product Impact Podcast goes deep on exactly this. Worth 25 minutes of your team's time to understand where AI is heading.

Don't let LLM models put your job at risk

The bills are coming due. Forrester says 78% of enterprises overran their 2025 AI budget by 47%, and Q1 2026 looks worse, not better. Every quarter your AI spend climbs faster than your AI value metric, the conversation with your CFO and your board gets harder. Once those bills hit, executives are going to put pressure on every team. The teams that can prove outcome per dollar will keep their budgets. The teams that can only show velocity will lose them — and the people on those teams will lose more than that.

Do not let yourself get addicted to the latest frontier model. The dropdown is engineered to do exactly that to you. Focus on improving outputs and especially outcomes — what actually changed in the business because your team used AI, not how many tokens your team burned proving they were using it. A team shipping outcomes on Haiku will outlast a team burning Opus tokens to look productive in a deck.

Remember that organizations are about to be redesigned. We covered the pattern of CEOs publishing layoff notices that signal the new operating model. In that redesign, your personal token count is irrelevant. What will count is how well you elevate the work of everyone around you — and that comes from knowing how to leverage context and tokens deliberately, not from defaulting to the most expensive model on every call.

Frontier models are too expensive and too predatory to stay the default for everyday enterprise work. Within a year we'll see a correction where the businesses that remain profitable will rely on efficient models and open source models. The rest will get buried beneath expenses that never made sense.




📋 Survey embed — What would you like to learn in a context management course?



Arpy Dragffy is CEO of PH1 and host of the Product Impact Podcast. He helps enterprise AI leaders measure and accelerate the impact of their AI investments. AI Value Acceleration's full Token Economics Crisis report, the Orchestration Playbook, and the Enterprise AI Value Crisis report are available at aivalueacceleration.com.

How helpful was this article?

Have a story to share?

0 / 500
A
Arpy Dragffy

AI Product Strategist · CEO, ph1.ca · Host, Product Impact Podcast

Latest Episodes

All episodes

Product Impact Newsletter

AI product strategy delivered weekly. Free.