You type a message. Claude responds. Feels simple.
What's actually happening behind the scenes? Claude isn't just reading your latest message. It's re-reading everything — every message, every file, every tool definition loaded in that session — from the very beginning, every single time.
That's the part most people don't realize. And it's why your Claude usage disappears faster than you expect, why API bills look higher than the work you got done, and why power users seem to do so much more on the same quota.
I've cut my token usage by over 60% without changing what I actually accomplish. These are the seven habits that made the difference.
1. Treat Long Conversations Like a Leaky Bucket
Every time you send a new message in an existing conversation, Claude re-reads the entire thread. Message one. Message twenty. Message forty-seven. All of it, again, every single turn.
That 10,000-line log you pasted in message three? It's riding along silently in every message you send after it — costing tokens you can't see.
The fix: Start fresh conversations more aggressively. Once you've finished a task or changed direction, don't keep building on the same thread. A new conversation resets the context to zero. Long chat threads are one of the biggest hidden token drains — every new message makes Claude re-read the entire conversation, including old instructions and outdated code.
If you're using Claude Code, set your compact override to 70% instead of the default 95%. Claude compacts near 95% capacity by default — setting an override to 70% for normal work, or 50% for noisy workflows, helps manage token usage before it compounds.
2. Batch Your Requests — Stop Sending One-Liners
This one habit alone can cut your usage dramatically.
Most people send messages like this:
"Fix the headline." (waits) "Now make the intro shorter." (waits) "Also change the CTA."
That's three separate messages, each forcing Claude to reprocess the entire conversation from the start.
The fix: Combine everything into one clear message. "Fix the headline, shorten the intro to two sentences, and change the CTA to focus on urgency." One message. One reprocess. Same result.
Breaking work into multiple steps feels natural, but it is expensive in token usage — batching is one of the easiest ways to cut usage without losing quality.
3. Stop Pasting Everything — Paste Only What's Relevant
Here's a habit that silently burns tokens every session: copying in entire documents, codebases, or long articles when Claude only needs a fraction of them.
Claude processes everything you send, even if most of it is not useful. That 4,000-word report you pasted to ask one question about the conclusion? Claude read all 4,000 words. Every time you follow up in that thread, it reads them again.
The fix: Trim before you paste. If you need Claude to fix a specific function, paste that function — not the entire file. If you need Claude to summarize a section, paste that section — not the full document. Less input = fewer tokens = faster responses = lower cost.
The context window fills up faster than most people realise — every file, message, and tool definition competes for the same space.
4. Disable Connectors You're Not Using
This one surprises almost everyone the first time they hear it.
Every MCP connector — Google Drive, Slack, Calendar, and others — and every tool you have enabled loads its full tool definition into your context window on every single message, whether you use it or not.
If you have five connectors enabled and you're using none of them for the task at hand, you're burning thousands of tokens per message just on definitions Claude doesn't need.
The fix: On claude.ai, click the "+" button → Connectors → Tool access and disable the tools you don't need for your current session. Turn them back on when you need them. It takes ten seconds and the savings compound across every message in that session.
5. Use the Right Model for the Task
Not every task needs the most powerful model. Simple tasks do fine with lighter models — formatting, quick edits — while complex reasoning benefits from the full model. Using a heavy model for everything leads to unnecessary token burn without real benefit.
Think of it like transport: you don't take a long-haul flight to go two kilometers. Claude Haiku handles quick, well-defined tasks fast and cheap. Claude Sonnet handles most substantive work. Save Opus for genuinely complex reasoning tasks.
Matching model to task is one of the highest-leverage changes you can make, especially if you're using the API.
6. Use /context to Diagnose Before You Optimize
Most token optimization advice skips this step. Don't.
The /context command is your diagnostic tool. Before changing your whole workflow, look at what is actually being loaded or repeatedly re-sent. In many cases, the biggest improvement doesn't come from better prompting — it comes from spotting one "quiet offender" that has been riding along in every turn.
A big file Claude read early in a session. Accumulated tool output. A heavy memory file. These silent passengers inflate every subsequent message. You can't fix what you can't see — so look first, optimize second.
For API users, run /cost at the end of a session to see your token count and estimated spend. For claude.ai users, Settings → Usage shows your remaining quota and reset window.
7. Write Specific Prompts — Vague Prompts Cost More
This is counterintuitive but true: vague prompts cost more tokens, not fewer.
When you write "improve this," Claude has to spend tokens interpreting what "improve" means, generating options, hedging its choices, and often asking clarifying questions. The output is longer, the follow-ups multiply, and the thread grows.
When you write "rewrite this intro to be under 50 words, lead with the stat, and remove the second sentence," Claude executes it directly. One pass. Clean output. No back-and-forth.
Giving Claude unambiguous boundaries between different types of information reduces misinterpretation and wasted follow-up turns. Specificity isn't just good prompting practice. It's token efficiency.
The Honest Total
Reducing Claude token usage by up to 90% is not about one trick — it comes from changing how you work with the system.
None of these changes require new tools, new subscriptions, or new workflows. They require understanding one thing: Claude isn't reading your latest message. It's reading everything, every time.
Once that clicks, the fixes are obvious. Start fresh conversations sooner. Batch your requests. Paste only what matters. Disable connectors you're not using. Match your model to the task. Diagnose before you optimize. Write with specificity.
Do all seven consistently and you'll ship the same quality work — for a fraction of what you're spending today.
Which of these are you already doing — and which one surprised you most? Drop it in the comments. Let's compare notes.
Discussion