MCP Is the Authoring Layer. Execution Should Cost Zero Tokens.

April 13, 2026 · Leon Ting · 6 min read

Two posts on Reddit this month independently measured MCP's token overhead. Both reached the same number: 30–40% more tokens than the CLI equivalent.

"I added Notion, Sentry and Shortcut MCPs and was surprised to see every session starting off with 40% of the context used."
— NoSlicedMushrooms (28 upvotes), r/ClaudeAI

"A batch job with 4 MCP servers blew through our token budget in 2 hours. The schema injection on every turn is the killer."
— tom_mathews, r/ClaudeAI

The "MCP is dead, just use CLI" take followed immediately. But three independent users — in three different threads, on three different subreddits — arrived at the same conclusion: the problem isn't MCP. It's using MCP for the wrong job.

"MCP for the main orchestrator, CLI for sub-agents. Both hit the same backend."
— raphasouthall, r/mcp (48 upvotes)

"MCP makes sense for discovery, not for known workflows."
— tom_mathews, r/ClaudeAI

"Development Tool versus Production Tool. MCP the shit you serve to clients and CLI while building."
— mat8675, r/ClaudeAI

They're all describing the same architecture. And it's the architecture Tap has used from day one.

The Two-Layer Model

Layer 1: MCP (Authoring)
capture     → AI inspects the site, picks the strongest structural address,
              emits a bare v2 Plan. With site+name, persisted to disk.
verify      → snapshot equivalence check; 4-arm verdict.

AI participates during capture. Tokens consumed. One-time cost.

─────────────────────────────────────────────

Layer 2: Execution
<site>.<name>   → saved tap auto-projects as MCP tool; runs deterministically
tap <site>/<name>  → same plan, run from CLI

Zero AI. Zero tokens. Deterministic. Forever.

MCP is the authoring layer. It's where AI inspects what the site looks like, what API endpoints are available, which structural address (JSON-LD / RSS / OpenAPI / OpenGraph / HTML list) carries the answer, and how to structure the extraction. This is a one-time process — capture — that produces a .plan.json file.

After that, the saved tap auto-projects as the MCP tool <site>.<name> and replays at zero AI tokens. No re-inspection. No schema injection on every call. No token overhead. The plan is bare JSON. It runs in less than a second.

The Numbers

raphasouthall measured MCP overhead precisely for a 21-tool server:

	MCP capture	Saved-tap replay
Upfront cost	~1,300 tokens (schema injection)	0
Per-call cost	~800 tokens	~750 tokens
After 10 calls	~880 tokens/call (amortized)	750 tokens/call

For a single forge session (one-time), ~1,300 tokens of overhead is nothing. For 1,000 daily executions? It's the difference between $0 and $135/month.

Tap's architecture makes this explicit: pay the MCP overhead once during forge, then run at zero overhead forever.

How Tap's MCP Surface Stays Small

Tap exposes a deliberately small MCP surface: 3 meta verbs (capture / verify / mark) plus N saved-tap projections (one MCP tool per saved <site>/<name>.plan.json). The meta verbs cost a fixed ~600 tokens of schema; saved-tap projections only list the taps you've authored.

# Meta verbs (always available, ~600 tokens schema)
capture   verify   mark

# Per-tap projections (one entry per saved plan)
github.trending     hackernews.hot     arxiv.search
reddit.hot          douban.top250      ...

This is the same pattern the community arrived at independently:

"Splitting tools into a tiny default set and a second on-demand pack, because dumping every possible tool into session start was where the waste really showed up."
— Organic-Bid-8298, r/mcp

Why Not Just Use CLI for Everything?

Because authoring requires tool discovery. When AI is figuring out how to scrape a site it's never seen before, it needs typed parameters, rich descriptions, and structured responses. That's what MCP does well.

"The one thing MCP does well is when it's tightly integrated (like Claude Code's built-in tools) — that feels natural because they control both sides."
— SmartYogurtcloset715 (8 upvotes), r/ClaudeAI

Tap controls both sides. The MCP server and the CLI are the same binary. The MCP tools call the same functions the CLI calls. The difference is when each is used:

Forge (one-time): MCP tools, because AI needs to discover and iterate
Run (every time): CLI, because the program already exists
Doctor (periodic): either — MCP for interactive diagnosis, CLI for scheduled health checks

The Implication for Browser Automation

Most browser MCP tools are execution-layer tools. They run in the browser on every call. That's where the token cost comes from — not just schema overhead, but the entire page state (accessibility tree, screenshot bytes, console output) flowing into the context window on every interaction.

"Every browser_navigate + browser_snapshot call costs ~1,500 tokens in JSON schema framing — even though the actual useful output is just a few lines of text."
— BagNervous, r/ClaudeAI (Browser CLI author)

Tap's browser tools exist in MCP for authoring only. During forge, AI uses tap.nav, tap.eval, tap.screenshot to understand the page. After forge produces a .tap.js, execution calls the browser directly — no MCP framing, no token overhead, no context window pollution.

The 1,500-token-per-call problem doesn't exist for tap.run. It's not an MCP call. It's a function call.

Health Contracts Catch What Pydantic Can't — semantic validation for scraper output
Programs Beat Prompts — why AI should write code, not run it
The Interface Protocol — 8 operations that replace every browser automation SDK

Try it

curl -fsSL https://taprun.dev/install.sh | sh

# MCP for authoring — one time
tap forge https://news.ycombinator.com

# CLI for execution — zero tokens, forever
tap hackernews hot