You launch Claude with Playwright MCP. Three tool calls in, you're at 80,000 tokens. Five calls in, the context window is half full. You scroll the log and realize the model just spent 27,000 tokens reading the DOM of one page — to click a single button.
You are not alone.
"I got tired of playwright-mcp eating through Claude's 200K token limit. 114,000 tokens through MCP vs 27,000 tokens through CLI — for the same task."
— syntax-sherlock, Hacker News (189 points, "Playwright Skill" launch)
Every page.click(selector), browser_snapshot, and locator call the model issues round-trips a DOM payload, an a11y snapshot, and a status response. The LLM reads all of it. Then it plans the next action. Then it reads another snapshot. The loop compounds.
For a single 10-step workflow you pay for the model to re-ingest the page 10 times, reason over 10 planning prompts, and emit 10 tool calls. The page barely changed. The plan barely changed. You paid anyway.
Here's a user on r/AI_Agents who figured this out without an MCP spec in front of him:
"The AI runs the workflow once, learns the pattern, then it executes without the LLM — making it 100x cheaper and way more reliable. My monthly LLM costs went from $200 to $2."
— u/Omega0Alpha, r/AI_Agents
That's the architecture. AI participates at authoring time. Runtime is deterministic. Taprun ships this as two commands.
# 1. Forge — AI participates once, writes a program $ tap forge "publish a post to X" ✔ Saved: x/publish.tap.js (157 lines) # 2. Run — zero AI, zero tokens, deterministic $ tap x publish --title "hello" --body "first post" ✔ Posted (1.8s, $0.00, 0 tokens)
The first command uses AI the way MCP expects — DOM snapshot, planning, selector discovery. It does this once and writes a .tap.js program to disk. Every subsequent execution runs the program directly. No LLM round-trip. No tokens. No planning loop.
If you use Claude Code, Cursor, or any MCP host, Taprun plugs in the same as Playwright MCP:
# .mcp.json { "mcpServers": { "taprun": { "command": "tap", "args": ["mcp"] } } }
The difference is what happens after the tool call. Playwright MCP hands the model a DOM snapshot and says you figure it out. Taprun hands the model the name of a pre-compiled program and says I already figured it out — here's the result. A tap.run call returns rows, not raw HTML. 27k → 0 tokens per execution.
| Playwright MCP | Taprun | |
|---|---|---|
| One-off exploration | ✔ Best fit | tap forge works, overkill |
| Repeated workflow (>3 runs) | Burns tokens each run | ✔ Forge once, $0/run |
| Long agent sessions | Context fills fast | ✔ Tool calls return rows |
| Deterministic output | LLM-variable | ✔ Same input → same output |
| Token cost per step | ~11k–27k | 0 |
Playwright MCP is a fine prototype tool. For anything you do more than three times, you want a compiled program.
Forging has a one-time token cost — typically 2–5k tokens depending on site complexity. Break-even against Playwright MCP (at ~27k tokens per invocation) is well under a single run. After that, every execution is $0 and 0 tokens.
Sites change. That's the one thing compiled programs can't control. But when they do, tap doctor emits a structural diff — exactly which selectors moved, which API endpoints disappeared, what the new DOM looks like. You hand that diff to your agent and it patches the program. You never re-enter the 27k-token-per-step loop to rediscover the same page from scratch.