Here's a question that I've been thinking about, and probably you too if you've built your whole workflow around these tools. Right now Claude Code and Codex are cheap. Twenty bucks gets you in, a couple hundred gets you almost unlimited for most people's real work. It feels almost too good. And anytime something feels too good in tech, the little voice in the back of your head goes: yeah, but for how long? So let me actually dig into the economics, because the answer is more interesting than "prices go up eventually," and there's some genuinely alarming numbers floating around right now that explain exactly why this is all on borrowed time.
What you have to understand first: the AI companies are losing money on you
These companies are not making money on your subscription. They are lighting money on fire to keep you using their product. OpenAI made 13 billion dollars in 2025 and lost 21 billion doing it. Read that twice. They made thirteen billion and still ended the year down twenty-one billion. That's not a business, that's a fire hose of investor cash pointed at your monthly bill so you don't feel the real cost.
And the reason your cheap plan is so dangerous to them is wild once you see the actual breakeven math. SemiAnalysis ran the numbers and found Anthropic hits zero gross margin on its top-tier plans at roughly 10% utilization, and OpenAI goes negative at around 5.7%. Meaning if you actually used your two-hundred-dollar plan more than like a tenth of what it technically allows, you're already costing them money. A fully maxed-out $200 Claude Max plan would run about $8,000 a month at real API rates, and the equivalent ChatGPT Pro plan would be around $14,000. You're paying two hundred for something that costs them thousands if you lean on it. That gap is the issue, and it's a gap somebody eventually has to close.
Why agentic coding makes it so much worse
Here's the part that hits Claude Code and Codex specifically, harder than it hits the regular chatbots. Agentic coding is a token monster. When you ask Claude Code to do a real task, it's not answering one question, it's reading files, running commands, checking output, fixing errors, looping over and over, and every single step burns tokens. Powerful agentic AI can use up to a thousand times more tokens than a basic chatbot query. A thousand times.
So what makes these coding tools magical, that they can grind autonomously through a big multi-step task while you get coffee, is also what makes them ruinously expensive to run. One real coding task can push 400,000 to 2 million tokens through the API, and heavy automation reaches 500 to 2,000 dollars per engineer per month. The chatbot people asking it to write a birthday poem are cheap. We, the people running agents that chew through entire codebases, are the expensive ones. We're the utilization problem the math is scared of. And there are more of us every day.
It's already breaking budgets at the top end. One company blew through 500 million dollars in a single month after forgetting to put a usage cap on its employee licenses. Walmart capped employee spend at 1,500 bucks per tool. Microsoft reportedly canceled its Claude Code licenses over cost. These aren't broke startups, these are the richest companies on earth deciding the meter is running too hot.
So they're already quietly raising prices, you just haven't felt it yet
Here's the sneaky part. The price hike isn't coming as a big scary announcement that your twenty bucks is now forty. It's coming sideways, through the stuff that's harder to notice. The whole industry is shifting toward usage-based billing, where heavier users pay more, GitHub moved to a usage system after its monthly allotments, Anthropic has pushed some business customers to actual-usage billing, and OpenAI execs have floated pricing AI more like electricity or water, meaning you pay for what you burn.
That's the move. Not "we raised the price," but "we're changing how billing works," and "we're adding usage caps," and "the unlimited thing is now limited." You already saw it with the Fable 5 launch (before government took it away), free for two weeks, then it drops to usage credits. That's the template. Hook you on the flat rate, then quietly migrate the expensive behavior, our behavior, the agentic coding, onto pay-per-token where the real cost shows up. Analysts have said for years that these consumer subscriptions are priced for growth, not profit, that the whole point was to acquire users and get the market hooked, and that the acquisition phase is basically done now. When the goal stops being "get everyone addicted" and starts being "let's start making money off the addicts," the price changes. That's just how this always goes.
Okay, it's not all doom
It's not all doom and there's a real upside.
Base model prices have actually been flat or dropping, not rising. Opus 4.8 launched at the same per-token rate as the previous Opus, and its fast mode actually got cheaper, dropping from $30/$150 down to $10/$50. The per-token cost of intelligence keeps falling as the tech gets more efficient, which pushes against the rising-cost pressure. So it's a tug of war: models get cheaper to run, but we use way more of them, and which force wins decides your bill.
And there's a real escape hatch that didn't exist a couple years ago: the cheap competition is genuinely good now. DeepSeek V4 Flash runs around 14 cents per million input tokens, something like 54 times cheaper than the frontier models, and Chinese and open-source models keep closing the quality gap. Big firms are already shifting toward Chinese LLMs and open-source models to stretch their budgets. So even if Anthropic and OpenAI crank their prices, there's a floor under how bad it can get, because the second they price too high, everybody bolts to the cheaper option that's almost as good. Competition is your friend here. The model companies can't just gouge you, because DeepSeek is sitting right there.
And here's the escape hatch that actually scares the cloud providers, because it takes the meter away entirely: running the model on your own machine. This used to be a toy, the local models were cute but useless for real work. Not anymore. The open-weight Qwen coding models from Alibaba got shockingly good, and they run on hardware you might already own. Someone running Qwen 3.6 27B locally on a 48GB MacBook Pro clocked it at 77.2% on SWE-bench Verified, which puts a model running entirely on a laptop in the same league as the cloud frontier models from a year ago. On a Mac with 32 to 48GB of unified memory, the Qwen 3.6 35B mixture-of-experts model has basically become the default, fast enough to be usable because it only activates a few billion parameters per token. No subscription, no per-token meter, no usage cap, no code leaving your machine. You buy the hardware once and the marginal cost of every task after that is basically electricity.
That's why I think local LLMs running something like Qwen are about to get a lot more popular, especially if cloud pricing tightens the way the math says it will. The moment your monthly Claude Code bill starts creeping toward what a decent GPU or a maxed-out Mac costs, the calculation flips, and a lot of developers are going to do that math. It won't be everyone, the frontier cloud models will stay ahead on the hard stuff and most people won't want to babysit their own inference server. But for the huge pile of everyday coding tasks that don't need the absolute best model, a local Qwen that costs nothing per token starts looking really attractive really fast. The cloud providers know this, which is part of why they can't push prices too far. Every dollar they add to your bill makes buying your own hardware look smarter, and the open models are now good enough that it's a real threat instead of a bluff.
The other thing keeping prices down is that nobody wants to be the first to blink. Raising prices or clamping limits risks losing users in a market where everyone's still fighting for share, so they're all stuck subsidizing each other in a standoff. As long as that standoff holds, you keep getting the cheap deal. It only breaks when the investor money runs thin enough that somebody has to actually turn a profit.
So how long have we got
My honest read? The "cheap unlimited" era for heavy agentic users specifically is the part that's living on borrowed time, and I'd guess we're talking one to three years before it meaningfully tightens, not months. The analysts pointing at OpenAI's for-profit conversion and the investor pressure for returns are saying the same window. It won't be a single doomsday price hike. It'll be a slow tightening, usage caps getting stricter, the unlimited tiers quietly capped, the expensive agentic stuff nudged onto metered billing, until one day you look up and realize you're paying per task and budgeting your prompts like you budget AWS.
But "unaffordable" is probably too strong, and that's the genuinely good news. Between falling per-token costs and a pack of hungry cheaper competitors, the likely future isn't that AI coding becomes a luxury, it's that the all-you-can-eat flat rate goes away and you get more careful about which model you throw at which task. You'll route the dumb stuff to the cheap model and save the frontier model for the hard problems, which, honestly, you should be doing already. The free-money phase ends, the be-smart-about-spend phase begins.
So here's my actual advice. Don't panic, but don't build your entire business on the assumption that this stays this cheap forever, because it won't. Learn to use the cheaper models for the tasks that don't need the big brain. Keep an eye on the open-source options so you've got a ripcord if pricing turns ugly. And enjoy the subsidized buffet while the investors are still paying for it, because the check is coming, it's just not coming today. That's a topic we'll all be revisiting, probably with smaller usage limits and a more expensive bill, sooner than any of us would like.
Sources
TechSpot: OpenAI and Anthropic Can't Afford to Have Everyone Use AI - The SemiAnalysis findings that OpenAI made $13B and lost $21B in 2025, the zero-margin utilization thresholds (Anthropic ~10%, OpenAI 5.7%), and the $8,000 and $14,000 theoretical full-usage costs of the $200 plans.
Tom's Hardware: AI Costs Spike as Subscriptions Hit Pricing Wall - The point that agentic AI uses up to 1,000x more tokens than a basic query, the company that spent $500M in a month with no usage cap, and the shift toward Chinese and open-source models to stretch budgets.
GuruFocus: AI Pricing Shock Hits OpenAI, Anthropic and Microsoft - The industry-wide move to usage-based billing, GitHub's shift after monthly allotments, Anthropic moving business customers to usage billing, the electricity-and-water pricing comparison, and Walmart's $1,500-per-tool cap.
MindStudio: Why the $20/Month Era Is Ending - The argument that consumer AI subscriptions are VC-subsidized and priced for growth not profit, that the acquisition phase is largely complete, and that meaningful increases are likely within one to three years.
Morph: AI Coding Costs 2026 - The per-task token figures (400K to 2M tokens), the $500 to $2,000 per engineer per month for heavy automation, and DeepSeek V4 Flash pricing at roughly 54x cheaper than frontier models.
Finout: Anthropic API Pricing in 2026 - Current per-token rates showing Opus 4.8 launched at the same rate as the prior Opus with fast mode dropping from $30/$150 to $10/$50, evidence that base model costs are flat to falling.
p>Igor Kulman: Running a Local LLM Coding Server on MacBook Pro - The real-world test of Qwen 3.6 27B running entirely on a 48GB MacBook Pro and scoring 77.2% on SWE-bench Verified, in the league of cloud frontier models from a year earlier.
InsiderLLM: Best Local LLMs for Mac in 2026 - The breakdown of which Qwen models run on which Mac memory tiers, and why the 35B mixture-of-experts model has become the 2026 default for local coding on 32 to 48GB machines.






