Cédric Rittié

← Back to blog
8 min//

The quota wall: why your plan runs out mid-week

Claude Code context management: the 4 moves that keep you from hitting the quota wall and keep responses sharp Monday through Friday.

claude codeproductivitycontexttokensquota
Phase 3 · Automate · Article 2 of 4

When Claude tells you stop

Tuesday, 5pm. You're deep in a session that's flowing. You send the next message.

Revisit the pricing section and tighten the CTA.
Quota limit reached
You've hit the limit on your Claude Max plan.
Reset at 10:00pm.

Cut off. Right in the middle of something you can't resume tomorrow morning. And you don't get why. You've only used Claude once this week.

The reason isn't your message count. It's the weight of each message. Every time you send a question, Claude re-reads everything that's been said before. A 3-hour conversation that's accumulated 150K tokens means 150K tokens charged to your quota at every new turn. The counter emptied in silence while you worked.

The weight that piles up with every turn
Stacked bar chart: at every conversation turn, Claude re-reads all previous messages. Turn 1 processes 5K tokens, Turn 5 processes 100K.
New message Messages re-read on every turn
5 turns, ~200K tokens processed in total. A clean conversation would've used 25K.

Same root cause, different symptom: Claude starts getting wrong a convention you set 2 hours ago. It re-suggests something you already fixed. It gets strangely more generic. Not a coincidence. Every LLM loses precision as its context grows (Chroma 2025 study on 18 models, no exception).

The paradox: the tool lets you load everything. Your instinct pushes you to do it. You pay twice. In quota burned, and in quality drifting.

This article builds on CLAUDE.md and MCP servers. The frame is set. Here we learn to keep it up over time without getting blocked.

Double jeopardy: quota burning, quality drifting

Both problems share one cause: context inflating without control. But you feel them at different moments.

Same plan, two usages
Two vertical gauges: on the left, the 'no discipline' gauge is 95% full and touches the 'quota limit' line (blocked Tuesday 5pm). On the right, the 'with discipline' gauge is 30% full (you make it to Friday).
Without discipline, you hit the ceiling mid-week. With it, you keep headroom through Friday evening.

Problem #1: the quota that snaps.

Claude Pro, Max, ChatGPT Plus, Cursor Pro: they all have usage limits, not just prices. Messages per 5 hours, sessions per week, monthly "fast requests", each with its own metric. A conversation that's piled up 150K tokens costs 5 to 10 times more than a clean 20K one. You feel like you've done a normal morning. Your counter knows you've just burned three days of runway.

Problem #2: the quality that slides.

The longer the conversation, the more the model drifts. It forgets a convention you set at 9am. It re-proposes a thing you fixed at 11am. Responses get smoother, more generic, more cautious. The degradation starts well before the technical limit: you feel it by 60 to 70% fill.

A 4-person meeting decides. A 20-person meeting discusses. An AI's context is the same: the more people in the room, the less each voice counts.

If you're on the API with usage-based billing, it's the same story in cash: every re-read token gets fully billed. The principle doesn't change, only the counter does.

The 4 moves to manage your context

Context discipline comes down to four verbs. AI engineers use them every day, but they speak to anyone who's ever managed a heavy folder.

The 4 pillars of context management: 1. Write - externalize your memory, 2. Select - choose what enters, 3. Compress - summarize before continuing, 4. Isolate - delegate to a sub-agent.

1. Write: externalize your memory

Permanent rules, baseline decisions, recurring briefs shouldn't live in the conversation. They should live in files Claude re-reads on demand. Three layers stack together. The third one, the one most people never activate, is what really changes the game.

Project CLAUDE.md ./CLAUDE.md
The permanent brief. Loaded automatically at every session. Your positioning, your conventions, your red lines. Dedicated article.
+
Workflow Skills ~/.claude/skills/
Recurring workflows encapsulated. One Skill to draft an X post, another to audit a site, another to prep a meeting. Each has its own instructions and permissions.
+
Key leverage Second brain Obsidian · Notion · Drive
The real differentiator. You connect your Obsidian, Notion or Drive to Claude through a dedicated MCP, or by @-referencing your markdown files. Claude then taps into years of notes, product decisions, client briefs, editorial conventions. A memory far bigger than any CLAUDE.md can carry, and one that follows you from project to project.

Written once, read on demand. The pivot is the third layer: most users stop at CLAUDE.md. Those who connect their second brain are talking to a Claude that knows their history.

2. Select: choose what enters the context

Every token you paste into the prompt gets re-read on every turn. So you pick what enters, you don't dump and hope Claude sorts. Three levers, one concrete example each.

@notes/strategy-q2.md analyze the strategic gaps
Targeted read via @ · 2K tokens instead of 25K in copy-paste
/mktg:cro-audit https://my-site.com/pricing
Targeted Skill loaded · the 30 others stay dormant
Pull this week's pageviews via the PostHog MCP
Data loaded on demand · no context pollution upstream

Going further: a Skill that identifies the 2-3 useful files before starting a task. You run it first, you walk in with the right material, you avoid the "dump everything, we'll see" reflex.

3. Compress: the commands to know

Compression is what lets you go long without losing everything or paying for everything. Three commands to master in Claude Code.

/compact
Automatic summary · 150K → 22K tokens
Conversation compacted. We can continue.
/compact keep the editorial conventions and the brief, summarize the rest
Piloted compact · important detail stays intact
/clear
Context cleared · CLAUDE.md reloaded · counter reset

Auto-compact triggers at 80%. That's "I did nothing" mode: you've already burned 80% of your quota, and you have no control over what it chooses to summarize.

4. Isolate: delegate to a sub-agent

Noisy tasks (reading a long file, broad web research, exploring a codebase) have no business in your main conversation. They inflate the context for a result that fits in three lines. A sub-agent does the work in its own context and hands you only the conclusion.

Diagram: the main conversation sends a task to the sub-agent (orange arrow). The sub-agent processes 50K tokens of noise (reads, cross-references, synthesis) in its own isolated context. Only the result (3 lines, ~800 tokens) returns to the main conversation.

The noise stays in the sub-agent. Only the result crosses the boundary back to the main conversation. Your quota only sees the 800 tokens that matter.

Covered in depth in an upcoming article on the path.

The four moves work together. In a good session, you use them all, at different moments.

Your daily Claude Code routine

The 4 moves are the theory. Here's what it looks like in practice when you finish a phase and move on.

We're done with the client brief, moving to the landing page.
Want me to keep the brief context or start fresh?
/clear
Context cleared. CLAUDE.md reloaded.
Ready. What are we working on for the landing?

Three characters, two seconds, a quota saved. To repeat every time the topic shifts. That single move is the difference between a week that holds and a Tuesday 5pm blocked.

3 mistakes that blow up your quota

Three traps that burn quota without giving anything back.

!
Letting auto-compact do the job
Passive · 80%

You wait for Claude to say "context full". Auto-compact fires at 80% fill. By then, you've already burned 80% of your quota, and it summarizes according to its own logic, not yours.

Piloted · 50-60%

/compact keep the editorial conventions and the brief, summarize the rest

Triggered manually, with explicit instructions on what must stay in detail.

!
Stacking two topics in the same conversation
Everything in one

You finish a client brief. You go straight into a site audit. Same session, same conversation. The brief is useless for the audit, but it's re-read on every turn anyway.

One topic, one conv

/clear between two unrelated topics.

Your CLAUDE.md reloads, your context starts clean, your quota too.

!
Pasting a big file instead of referencing it
Copy-paste

You want Claude to analyze a 30-page document. You paste it in the prompt. On every subsequent turn, the 30 pages get re-read, even when you've moved on to another subject.

@ reference

@q2-report.md in Claude Code or Cursor, or accessible through an MCP.

Claude reads the file once, extracts what it needs, and leaves it alone.

ChatGPT, Cursor, Claude Code: the same principle

The frame doesn't change when you change tools. What changes is how much control you get.

ToolCLAUDE.md equivalent/clear equivalentFine control
Claude Code (Pro, Max)CLAUDE.md + Skills/clearCompact, sub-agents, caching
ChatGPT (Plus, Team, Pro)Projects + InstructionsNew conversationNo manual compact
Cursor (Pro).cursorrules + @CodebaseNew conversationLimited to select (@file)

ChatGPT: the GPT-5 usage limit snaps silently, often after 2-3 hours of intense sessions. No compression tool. The discipline happens at "when do I open a new conv" and on the density of your Projects instructions.

Cursor: monthly "fast requests" counter drains fast if you leave @Codebase on permanently. Use @file targeted instead of @Codebase all the time.

Claude Code: the most granular of the three. It's also the one that shows your weekly quota in real time if you're on Max. Training yourself to glance at it before a long session is the equivalent of checking the fuel gauge before a long drive.

The reflex to keep

Context is a resource, not a dumping ground.

Before piling on a file, a long excerpt, a new topic in your conversation, the question to ask yourself is simple: does this really need to be here? If the answer is no, you route it through CLAUDE.md, a Skill, an @ reference, or a sub-agent. Anything but pasting.

Tonight, one concrete action

Open your latest big Claude Code conversation. If it's past 50% of the context, run /compact with explicit instructions on what must stay.

Tomorrow morning, you'll start from 20K tokens instead of 150K. And you'll notice right away the difference in response quality.

After a week, it becomes a reflex. You stop thinking about it. You just notice that you finish the week without hitting the wall, and that Claude stays sharp Friday evening the way it was Monday morning.

Next step on the path: agents and sub-agents, which formalize the fourth move (isolate) and turn delegation into a repeatable workflow.

If this article saved you time,

it'll save time for someone in your network.

ShareLinkedIn

L’AI.ssentiel, every Friday

The AI signals that matter. For pros who already use AI.