The AI Free Lunch Is (Almost) Over

A Guimard Art Nouveau métro entrance in Paris, with its iconic green cast-iron canopy from the early 1900s. The kind of edicule you walked past every evening on the way home, where someone would hand you another flyer.

A stack of food delivery flyers from mid-2010s Paris

Around 2014, I lived near the Belleville métro in Paris. Every evening on my way home, someone would press a flyer into my hand. Foodora, TakeEatEasy, Deliveroo, Uber Eats, all rotating in front of the exit. Each flyer came with a code: free delivery, ten euros off, fifteen euros off, your first three meals on us.

For a few months, dinner was almost free.

Then Foodora left France. TakeEatEasy went bankrupt. The flyers disappeared. The codes stopped working. The free dinners ended.

I’m burning $200 to $400 of Claude tokens a day on a $200 a month plan. Last month, the API-equivalent ran past $8,000.

The pattern is older than the technology

Pick a category where the underlying unit cost is opaque to the consumer. Subsidize it heavily. Get everyone hooked on the low price. Wait for competitors to die or raise prices. Then collect.

Uber did it to taxis. Foodora did it to me and most of Paris. Anthropic, OpenAI, and GitHub are doing it now to anyone who codes.

Anthropic launched Claude Code, then Claude Pro at $20, then Max at $100, then Max 20x at $200 for the heaviest users. Then it tightened weekly rate limits when Claude Code burned through them faster than expected. OpenAI matched with a new $100 ChatGPT Pro tier, giving 5x more Codex capacity than the $20 Plus plan, framed as what “power users have been asking for forever.” GitHub Copilot, which started flat-rate, moved to usage-based billing. Each step is the same step the food delivery apps took, just compressed into months instead of years.

TechCrunch headline announcing the new $100/month ChatGPT Pro plan with expanded Codex access, April 2026. OpenAI’s new $100 tier explicitly targets Anthropic’s existing $100 Claude offering. The price points are converging.

The flyers haven’t disappeared yet. The codes still work. But anyone watching the ratio between what they pay and what they consume can see what’s coming.

Where the analogy breaks

Foodora lost money on every order. The unit economics were broken from day one and only VC subsidy hid it. The category was structurally unprofitable until prices doubled.

AI inference is not that bad. At API prices, the margin is real. So this isn’t bankruptcy talking, it’s monetization. The free lunch ends not because providers lose money on every token, but because they choose to stop subsidizing the heavy users once the market is captured.

A flat-rate consumer plan only works if usage follows a power-law distribution where light users subsidize heavy users. That’s how all-you-can-eat plans work, from gym memberships to mobile data. Anthropic priced Max against an assumed distribution where a few power users would burn the cap and most subscribers would barely touch it.

Two distributions of subscribers by token consumption: the assumed power-law decay (most users light, a thin tail of heavy users near the cap) versus the actual distribution after Claude Code, which is shifted right with the bulk of subscribers crowding the plan cap. The shape Anthropic priced against, and the shape Claude Code actually produces.

Claude Code broke that distribution. Agentic coding tools run in loops. They consume tokens whether you’re watching or not. Most developers who install one become heavy users by accident. The light-user subsidy pool the $200 plan was priced against doesn’t exist anymore.

That’s why rate limits keep tightening. Each tightening is the pricing trying to rebalance against actual usage. There is a floor below which it can’t go without making the plan useless to the developers it was designed for. We’re approaching that floor.

The 40x that isn’t going to happen

April was my heaviest month so far: 11.6 billion tokens through Claude Code, mostly cached reads with a sonnet/opus mix on the uncached side. Priced against the public API rate card, that’s roughly $8,000 of equivalent consumption against a $200 flat-rate plan. A 40x gap. Earlier months ran lighter ($250 to $850 in token value), not because each session was cheaper but because I ran agents less. The trend is up and to the right because Claude Code is up and to the right. For Max to cover its real power users at API margins, the price would need to land somewhere close to that $8,000.

The market for $200 Max isn’t casual users, it’s professionals. Developers, freelancers, founders, indie shops. Even they won’t pay $8,000 a month for a coding assistant. They’d cancel. Pros will pay $1,000 for a tool that pays for itself in a week. Not $8,000.

The realized increase for someone like me won’t be 40x. It’ll be 3 to 5x, in line with what every other tiered tech market has done: AWS On-Demand priced at roughly 3x Reserved, mobile data overage charges typically 3-5x the in-plan rate, broadband “fair-use” caps that throttle to a quarter of the headline speed. Anthropic doesn’t have to invent a new economic mechanism. They just have to copy the one cloud and telecom landed on a decade ago: harder caps, a new mid-tier between flat-rate and metered, and the actual heavy users pushed to the meter. The flat-rate plan doesn’t disappear. It becomes vestigial for serious work, the same way “unlimited” cell data plans still exist but throttle you past a cap buried in the terms.

The one thing that could blunt this: a competitor that holds the line at $200 to steal Anthropic’s heavy users. Google has the cash and a real product (Gemini in Cursor and IDEs is genuinely close on coding), and Google has the strongest history of subsidizing for share. DeepSeek V4-Pro at 80.6% on SWE-bench is essentially free open competition that nobody owns and nobody pays for. Meta has the weights, the GPUs, and a reason to commoditize the layer below their assistants. If any of those refuses to follow the tier-shuffle, the squeeze on Anthropic stalls. It’s the live counterargument and it isn’t weak. I still expect Anthropic to move first because they have the most to lose, and the others to follow within a year because they always do. But this is the question that decides how steep the next twelve months get.

If you run a few prompts a day, you’ll keep your great deal. If you use AI as a daily tool for paid work, you’re moving to a meter.

The escape hatch

The food delivery analogy doesn’t have a third path. You couldn’t self-deliver a Deliveroo. But you can run Llama, DeepSeek, Qwen, or whatever open weights are worth running by then.

On SWE-bench Verified, DeepSeek V4-Pro hit 80.6% earlier this year, two points behind GPT 5.5 (82.6%) and within rounding of Claude Opus 4.6 (80.8%) and Gemini 3.1 Pro (80.6%). A year ago the open-weights frontier was around 55% on the same benchmark while the closed frontier was already past 70%. The gap closed from roughly 18 points to 2 in twelve months.

Bar chart of top SWE-bench Verified scores in May 2026: GPT 5.5 at 82.6%, Claude Opus 4.7 at 82.0%, Claude Opus 4.6 at 80.8%, DeepSeek V4-Pro at 80.6% highlighted in cyan as the top open-weights model, Gemini 3.1 Pro at 80.6%. Top SWE-bench Verified scores, May 2026. DeepSeek V4-Pro is within two points of the closed frontier. Source: SWE-bench Verified leaderboard.

The hardware story is the part that’s still moving and the part the optimists handwave. Running a serious open model at the latency and quality you want today requires a Mac with 96 GB or more of unified memory ($3-$4k+ upfront), a small GPU rig, or an H100-class cloud GPU rented by the hour (around $2-$3/hour, which is $500-$700 a month at eight hours a day). None of that is cheaper than $200 a month right now. The math closes when open weights reach parity on agent loops and Apple’s unified memory hits a generation that lets a serious model run cool on a desktop.

I haven’t switched yet. I’m watching DeepSeek V4 and Qwen on the side, and the day my daily workflow fits on a Mac under my desk is the day I migrate. The migration is its own work, though. Context, prompts, custom commands, agent recipes, muscle memory, all of it transfers slowly. Start before you need to. The escape hatch is being built right now, even if it isn’t a door you can walk through this quarter.

What to do this year

Three things.

Use the subsidy aggressively while it’s there. Code like crazy on Max. Build agents. Ship things. Don’t ration yourself to fit inside an imagined fair-use ceiling. The point of being inside the market-capture window is to extract maximum value from the misprice. The agents shipping code into your repo today are cheaper than they will ever be again. I’ve been stacking commits faster than I can read them back for the last few months, and the math behind that is going to look very different in twelve months.

Budget for the meter. Whatever your team’s AI line item is today, model what it looks like at 5x in a year. If that number doesn’t fit your gross margin, start changing the workflow now. Cache. Use smaller models for the easier turns. Restructure prompts. Batch where you can. The teams that wait for the bill before changing the workflow are going to have a bad quarter.

Watch the open models. Not because they are ready today (close, but not for everything). Because the day Anthropic announces the new $500 tier, you want to already know whether your daily workflow can run on a Mac sitting under your desk. Like ARM versus x86, the open option will look obviously inferior right up until it suddenly doesn’t.

The free meal at my desk is ending in 2026. The next twelve months are the last cheap ones. Code fast.