AI token costs: Why Big Tech Is Panic-Capping Spending

AI will replace workers, I am sure you already heard this somewhere. In the end of 2025 many people lost their jobs, Big tech fire a lot of roles and replace it with AI. The hype took over Silicon Valley. The math seemed seductively simple. No need of Junior devs, middle devs cost about 150K USD and AI subscription was cheap about 20 to 50 USD and code faster so if you are a top manager the equation is more productivity for less cost. This did not only affect Software engineer, many more positions were affected on this AI vibes.

January 2026 all was clear AI was not just a tool anymore, it became part of the work. Companies raced to equip their workforces with the most powerful agentic AI tools on the market: Claude Code, Cursor, GitHub Copilot. The belief was that these tools would operate like tireless digital interns, handling the grunt work while human employees moved up the value chain to “strategic oversight.” What nobody built into the models was the meter.

By mid-2026, a different kind of spreadsheet started landing on CFO desks. These weren’t projections. They were invoices. And they told a story that no one in the C-suite had prepared for: the AI tools were costing more than the people using them.

This is the story of the AI Boomerang the moment when the technology thrown forward to cut costs came spinning back with a bill no one expected.

First, the Uber Runaway Script.

December 2025, Uber applies this logic. If AI can write code faster, engineering can be focused on what really matters. Sounds great right? So they put a budget that will allow their 5000 engineers to get access to the AI tools like Claude code and Cursor. This works and engineers were very happy. We understand why because this was not just coding but became agents that were able to work independently and debug and you know how debug can be hectic. They were able to draft entire microservices. This was so awesome that 10% of Uber code was written by AI.

Until the real AI trouble occurs. Not that bad type of trouble but it is the way they function. AI is not human. Same bug that takes you 30 minutes to identify will take AI 1 min. As humans we try solutions one by one until we get the one that works. But AI will try multiple solutions at once and get the one that works. And AI tools work using tokens to measure use of user and price their tools. So each solution tried by AI costs tokens. So when a human will try 4 solutions and get the answer on the 4th try.

AI will try multiple and even if the correct solution is the first, tokens will still be considered on the 120th try. As I said AI tests all possibilities at once and all of them will cost tokens. So when 10% of your code is written by AI that means you are using a huge amount of tokens. By April 2026 Uber hit 700 million tokens per week. Yes that’s correct 700 million tokens. This is no longer 20 $am o n t h p ere n g in eer . T h e p r i ce f l e w an d where ab o u t $ 500$ to $2000 per engineer on the company account.

So you will ask how is engineer, most of them are senior can burn that amount of tokens right? If you follow what I just wrote on top you will easily understand. Those tools are made to test all possibilities to get answers as quick as possible. Meaning they will consume a lot of tokens and tokens cost money. That makes the year budget to be consumed in 4 months.

So they have to put limits on engineer use amount limit to $1,500 on June 2026. Same month on a podcast called Rapid Response, Andrew Macdonald the Uber’s COO admitted that it cost them a lot and he is not really sure if all this spending really benefits the customer….

The tools were writing code. The dashboards were green. But was any of it moving the business forward? Uber couldn’t say.

Meanwhile, Microsoft Was Planning a Quiet Retreat

If Uber’s story was one of public chaos, Microsoft’s was one of quiet, calculated retreat. In May 2026, Rajesh Jha, the Executive Vice President overseeing Microsoft’s Experiences and Devices division the massive organization responsible for Windows, Office, and Teams issued an internal directive. All engineers in his division had to surrender their Anthropic Claude Code licenses. No exceptions. The official reason given was “tool unification.”

Engineers were told to migrate to Microsoft’s own GitHub Copilot CLI, a product developed in-house. Was it the real reason? we are close to 30th of June. Why you may ask. It’s the literal last day of Microsoft’s fiscal year. In corporate finance, this is the deadline when every variable cost gets scrubbed, every unpredictable line item gets reconciled, and every budget that bled gets cauterized before the new financial year begins on July 1.

Claude Code had become “too popular” inside Jha’s division. Internal token billing had exploded. And unlike a flat SaaS subscription, where finance knows exactly what the bill will be every month, agentic AI spending was a runaway variable that Microsoft’s accounting systems couldn’t predict or control.

So Microsoft did what giant corporations do when a cost center spins out of control: they killed it before it could ruin a quarterly report.

The “tool unification” narrative was the polite cover. The real motive was a classic corporate accounting move wiping a bleeding, unpredictable variable cost off the books before the fiscal reset.

So Why Does This Keep Happening?

AI business models are totally different. As SaaS mode is simple you subscribe and you can use as much as you can. But AI models bring tokens. Why you may ask because each time you are using, each token burned means huge amount of Energy consumed and many more physical things happen. That’s why advanced AI tools work like a taxi meter. That means anything matters even a little hello. When you give AI instruction like fix this bug without clarifying or details on the process, you just leave it by itself to find solutions do not be surprised to see all your tokens gone. The agent will try every possibility until your token or budget is finished. This pricing model creates a new kind of corporate risk: “Tokenmaxxing.”

Tokenmaxxing is what happens when employees, with the best of intentions, burn premium, ultra-expensive computing power on trivial tasks. A developer uses a top-tier model to proofread a three-sentence internal email. Engineers will let AI agents run for small things that can be tricky sometimes but can easily be managed by hand like center a Div and many more because it is easy when done by AI.

Individually, these moments seem harmless. Collectively, they drain budgets at a rate no CFO can justify.

The trap isn’t that AI doesn’t work. The trap is that it works too well it scales consumption faster than it scales value.

Okama Digital

From $20/Seat to $2,000/Engineer: The Real Price of AI Coding

First, the Uber Runaway Script.

Meanwhile, Microsoft Was Planning a Quiet Retreat

So Why Does This Keep Happening?

Leave a Reply Cancel reply

Okama Digital

First, the Uber Runaway Script.

Meanwhile, Microsoft Was Planning a Quiet Retreat

So Why Does This Keep Happening?

Leave a Reply Cancel reply

Related News

A/B Testing: The Basics Every Analyst Should Know

How Claude went from government blacklist to the #1 app in the country — in 48 hours.