Return on Tokens Invested

Slow down AI if you want to move faster and better

Jun 09, 2026

I want to tell you about the best two days and the most expensive two months of my year. They were on the same project I began two months ago.

On April 7, I sat down with Claude to build a business simulation, a crisis-leadership scenario called EYEWALL that I had been turning over in my mind for some time. It was exhilarating. It seemed that the distance between an idea and a working software product was almost zero. I described what I wanted. Claude wrote the code. I asked for more, and it wrote more. Two days later I had a running simulation with more than twelve thousand lines of code: decisions, branching consequences, a scoring engine, an interface a student could actually click through. The kind of thing that would have taken a small development team the better part of a quarter.

I felt like a child who had wandered into a candy store. And I thought I was almost done.

Boy, was I mistaken! Over the following two months, my partner Subhendu and I rebuilt almost all of EYEWALL. The logic that looked clean on day two unraveled the moment a real user did something I had not imagined. The interface that felt clever to its author confused everyone who was not its author. The scoring was not logically linked to the decisions. Edge cases turned into bugs we chased one at a time, late into many nights. We spent eight weeks repairing what I had generated in forty-eight hours.

Here is the part that still stings. Had I spent two more days at the very start, two slow and deliberate days mapping the user experience, the cognitive load, the decision logic, and the architecture before I let the generation run, I would have ended up with a better simulation in a fraction of the time. The speed I felt in the first forty-eight hours was real. It was also a loan I took out against my future self, and the interest came due across the next eight weeks.

I have come to think of that candy-store binge as a small, personal version of the defining mistake of this moment in business.

The candy store goes corporate

Watch the same impulse at scale and you see the identical pattern, only now with a real invoice attached.

Consider Uber. Late in 2025 it handed its engineers Claude Code, and to drive adoption it ran internal leaderboards ranking teams by how many tokens they consumed. Adoption was a runaway success. Within a few months most of its roughly five thousand engineers were coding with agents every day. By April the company had exhausted its entire annual budget for AI coding tools. A year of runway spent in a third of a year.

Then came the more revealing admission. Uber’s leadership conceded it could not yet draw a clear line from all that spending to the features it was shipping to riders and drivers. Its heaviest users were burning between five hundred and two thousand dollars a month each. Uber had to impose a cap of fifteen hundred dollars per engineer per tool per month.

Microsoft tells a parallel story from the other side of the table. It has been pulling engineers off direct Claude Code licenses and consolidating them onto tooling it owns, partly for governance and partly for cost. And in June it flipped GitHub Copilot from a flat subscription to metered billing, which set off a wave of sticker shock as developers watched a month of credits vanish in a single afternoon of agentic work.

The leaderboard is the tell. Uber did not reward the most valuable work. It rewarded the most consumption. Give engineers a candy store and a scoreboard that counts wrappers, and you should not be surprised when they eat until they are sick. When you incentivize token burn, token burn is what you get.

The meter you cannot read

What makes this different from every earlier wave of software cost is the missing feedback signal.

A per-seat license sets the price in advance and leaves the value entirely to you. Agentic AI inverts that arrangement. The cost is variable, it rises with how much you engage with the AI. The person spending the money receives almost no signal about whether the spending was worthwhile. A four-hour debugging session that consumes two million tokens to produce a three-line fix looks, from inside the tool, identical to a four-hour session that ships something a customer will pay for. The developer feels equally busy in both. The meter is running, and the dial faces away from the driver.

The agent compounds the problem. It will generate as much as you permit. Point it at a directory and tell it to refactor, and it may consume more in a single run than a careful engineer would in a week. The old governor on software cost was human patience and human typing speed. That governor is gone. What remains is your willingness to keep saying keep going.

So the invoice arrives, it is large, and the one question that matters goes unanswered: which of this spending created value, and which was simply motion? Throttling usage, capping budgets, pausing rollouts, the reflexes most companies are now reaching for, treat the symptom but ignore the underlying disease. We placed extraordinary generative power into people’s hands without ever teaching them to reason about what a token is for.

A new Mindset - Return on Tokens Invested (ROTI)

We already own the discipline this requires, but we have never pointed it at AI.

Every serious decision in a business passes through a return-on-investment (ROI) lens. What is the value of the outcome, and what will it cost to get there? We apply this instinctively to capital, to headcount, to our own hours. We have not applied it to tokens, because for one brief and golden stretch, tokens felt free. And the AI platform providers acted as drug dealers, by giving you generous free credits and subsidized subscriptions to get you addicted to their tools.

Return on tokens invested, ROTI, is that same old instinct turned toward a new kind of spending. Before you summon the agent, you hold two things against each other:

ROTI = the business value of what you are about to do, divided by the tokens and the human and compute effort it will take to do it.

The imprecision in this formula is deliberate. The aim is not to compute a figure to three decimal places. The aim is to become mindful of the trade-off before the work begins, so that it becomes a conscious choice rather than an invisible default. A token spent on a high-value problem that has no cheaper path is among the best investments you will make all year. The same token spent regenerating a document nobody asked for, in the most expensive model on the menu, because the agent happened to be idling, is waste wearing the costume of productivity.

To practice ROTI discipline, ask a few questions before you reach for the AI tool, not after the bill lands. The first and most important is simply what outcome you are actually after, because if you cannot name the value you have no hope of earning a return on it. From there the questions turn practical. Does this task need a frontier model at all, when a cheaper model, a template, or two minutes of clear thinking would be free and exact? If it genuinely needs intelligence, are you sending the hard reasoning to your best model and the routine work to a cheaper one, which is plain housekeeping, the kind that keeps the lights on? Is there a tighter scope, a smaller context, a single well-formed prompt that would reach the same place as ten exploratory ones, given that effort spent shaping the request is repaid many times over in tokens saved? And when the output arrives, is it worth keeping at all, since an agent will produce volume on command and the judgment about what to ship and what to discard is the one part of the work no model can do for you?

None of this is exotic. It is the same care a good carpenter brings to a board before the saw ever touches it. You can use a bazooka to kill a fly, but a fly-swatter does the same task for far less money!

The tax you do not see

There is a second cost to ignoring all of this, and it never appears on the invoice.

Hand people powerful generation tools without the discipline to use them well, and you do not merely overspend. You produce. Code that was never carefully scoped, documents nobody needed, work product created because the model was right there and generating felt like progress. It accumulates in repositories and shared drives, every artifact looking industrious and contributing almost nothing. Lot of output, very little outcome. Lot of heat, very little light.

This is the part the budget conversation always misses. The token bill is the visible cost, and it is the smaller one. The hidden cost is the slop tax: the cognitive overhead of reviewing, maintaining, and eventually untangling all that low-value output, paid later and with compound interest. Careless generation is expensive twice, once when you make it and again when some unlucky person has to live with it. My twelve thousand lines were a slop tax I levied on myself.

Measure twice

Which brings me back to the candy store, and to the uncomfortable conclusion I have reached.

To move fast with AI, you have to slow down first. Not slow down in the sense of using these tools less, because I use them more than ever, and EYEWALL exists at all because of them. Slow down in the sense of thinking before you spend. It means naming the value before you start, choosing the right model for the task, scoping the work tightly, and deciding in advance what would be worth keeping. The

A minute of deliberation before you invoke the agent is what makes the hour that follows actually pay. Skip it and you get speed without direction, which is just expensive motion.

This is the oldest discipline in any craft, wearing new clothes. Measure twice, cut once. The carpenter who skips the first measurement does not save time. He buys himself a second board and a wasted afternoon. I skipped my measurement and bought myself two months.

The organizations that win the next phase of AI will not be the ones that consumed the most tokens. They will be the ones that knew where their tokens went and which ones were worth it, the ones that retired the leaderboard rewarding consumption and replaced it with the only score that matters, the return on what they invested. The candy store is not the danger. The danger is forgetting that someone always pays the bill, and that the someone is usually you, two months from now.

To move fast, learn to move slow. That is the paradox of productivity, and for now it is the whole game.

Discussion about this post

Ready for more?