Two posts on Reddit this month independently measured MCP's token overhead. Both reached the same number: 30–40% more tokens than the CLI equivalent.
"I added Notion, Sentry and Shortcut MCPs and was surprised to see every session starting off with 40% of the context used."
— NoSlicedMushrooms (28 upvotes), r/ClaudeAI
"A batch job with 4 MCP servers blew through our token budget in 2 hours. The schema injection on every turn is the killer."
— tom_mathews, r/ClaudeAI
The "MCP is dead, just use CLI" take followed immediately. But three independent users — in three different threads, on three different subreddits — arrived at the same conclusion: the problem isn't MCP. It's using MCP for the wrong job.
"MCP for the main orchestrator, CLI for sub-agents. Both hit the same backend."
— raphasouthall, r/mcp (48 upvotes)
"MCP makes sense for discovery, not for known workflows."
— tom_mathews, r/ClaudeAI
"Development Tool versus Production Tool. MCP the shit you serve to clients and CLI while building."
— mat8675, r/ClaudeAI
They're all describing the same architecture. And it's the architecture Tap has used from day one.
Layer 1: MCP (Authoring) capture → AI inspects the site, picks the strongest structural address, emits a bare v2 Plan. With site+name, persisted to disk. verify → snapshot equivalence check; 4-arm verdict. AI participates during capture. Tokens consumed. One-time cost. ───────────────────────────────────────────── Layer 2: Execution <site>.<name> → saved tap auto-projects as MCP tool; runs deterministically tap <site>/<name> → same plan, run from CLI Zero AI. Zero tokens. Deterministic. Forever.
MCP is the authoring layer. It's where AI inspects what the site looks like, what API endpoints are available, which structural address (JSON-LD / RSS / OpenAPI / OpenGraph / HTML list) carries the answer, and how to structure the extraction. This is a one-time process — capture — that produces a .plan.json file.
After that, the saved tap auto-projects as the MCP tool <site>.<name> and replays at zero AI tokens. No re-inspection. No schema injection on every call. No token overhead. The plan is bare JSON. It runs in less than a second.
raphasouthall measured MCP overhead precisely for a 21-tool server:
| MCP capture | Saved-tap replay | |
|---|---|---|
| Upfront cost | ~1,300 tokens (schema injection) | 0 |
| Per-call cost | ~800 tokens | ~750 tokens |
| After 10 calls | ~880 tokens/call (amortized) | 750 tokens/call |
For a single forge session (one-time), ~1,300 tokens of overhead is nothing. For 1,000 daily executions? It's the difference between $0 and $135/month.
Tap's architecture makes this explicit: pay the MCP overhead once during forge, then run at zero overhead forever.
Tap exposes a deliberately small MCP surface: 3 meta verbs (capture / verify / mark) plus N saved-tap projections (one MCP tool per saved <site>/<name>.plan.json). The meta verbs cost a fixed ~600 tokens of schema; saved-tap projections only list the taps you've authored.
# Meta verbs (always available, ~600 tokens schema) capture verify mark # Per-tap projections (one entry per saved plan) github.trending hackernews.hot arxiv.search reddit.hot douban.top250 ...
This is the same pattern the community arrived at independently:
"Splitting tools into a tiny default set and a second on-demand pack, because dumping every possible tool into session start was where the waste really showed up."
— Organic-Bid-8298, r/mcp
Because authoring requires tool discovery. When AI is figuring out how to scrape a site it's never seen before, it needs typed parameters, rich descriptions, and structured responses. That's what MCP does well.
"The one thing MCP does well is when it's tightly integrated (like Claude Code's built-in tools) — that feels natural because they control both sides."
— SmartYogurtcloset715 (8 upvotes), r/ClaudeAI
Tap controls both sides. The MCP server and the CLI are the same binary. The MCP tools call the same functions the CLI calls. The difference is when each is used:
Most browser MCP tools are execution-layer tools. They run in the browser on every call. That's where the token cost comes from — not just schema overhead, but the entire page state (accessibility tree, screenshot bytes, console output) flowing into the context window on every interaction.
"Every
— BagNervous, r/ClaudeAI (Browser CLI author)browser_navigate+browser_snapshotcall costs ~1,500 tokens in JSON schema framing — even though the actual useful output is just a few lines of text."
Tap's browser tools exist in MCP for authoring only. During forge, AI uses tap.nav, tap.eval, tap.screenshot to understand the page. After forge produces a .tap.js, execution calls the browser directly — no MCP framing, no token overhead, no context window pollution.
The 1,500-token-per-call problem doesn't exist for tap.run. It's not an MCP call. It's a function call.
curl -fsSL https://taprun.dev/install.sh | sh # MCP for authoring — one time tap forge https://news.ycombinator.com # CLI for execution — zero tokens, forever tap hackernews hot