rtrvr.ai is a polished entrant in the browser-agent space. Their architecture is genuinely interesting — “DOM-native” processing with “Smart DOM Compression”, 25× cheaper than vision-based alternatives, 81% SOTA accuracy on their reported benchmark. They ship Chrome extension, Cloud dashboard, API, MCP server, CLI, and even a WhatsApp bot. Their landing page lists 10 named competitors. The pricing mirrors Taprun almost exactly — $9.99 / $29.99 / $99.99 / $499.99 per month.
So when I first read their docs, the obvious question was: do they do what Taprun does? Because if they do, Taprun is in trouble.
They don’t. And the reason sits on a single architectural line every browser-agent tool has to pick a side of.
There are two fundamentally different ways to point an LLM at a browser:
Browser Use, Stagehand, Playwright MCP, and rtrvr.ai all sit on side (1). They differ in how they call the LLM — vision vs DOM, big model vs small model, whole page vs compressed — but not in whether they call it.
Taprun sits on side (2). tap forge runs the LLM once to author a .tap.js file. tap.run executes that file forever with zero inference.
This distinction isn’t marketing. It’s Python vs compiled C. Both evaluate expressions; one evaluates at runtime, the other at compile time. You pick based on whether the workload repeats.
Credit where it’s due. rtrvr gets a lot right:
If your use case is agentic exploration — new sites, unknown tasks, one-off interactions — rtrvr is a serious tool. I’d reach for it myself.
The ceiling isn’t quality. It’s structural.
Per-run cost scales linearly with runs. 26K tokens per task × 1,000 runs/day = 26M tokens/day. At Gemini Flash Lite rates that’s real money; at Gemini Pro rates it’s ~$260/day. rtrvr’s own pricing acknowledges this: the Basic tier is 1,500 credits/month, which at “5 credits/task” is ~300 tasks. A single production workflow running every 5 minutes eats that budget in three days.
Output variance is by design. When the same page, same prompt, same task produces slightly different extractions across runs, you can’t build monitoring around it. Row count fluctuation isn’t “a bug” when the system is designed to re-interpret the page every time. The 81% SOTA accuracy number is a fine benchmark result, but it means 19% of invocations are wrong in some way, and you don’t know which 19%.
“Self-healing” still pays tokens to heal. Every browser-agent tool in this category markets “self-healing”. What they mean is: when the selector breaks, the LLM re-runs to figure out the new one. That’s real, and it’s useful — but it is reactive. The task has already failed (or silently returned garbage) before healing kicks in, and every heal is another inference pass.
Taprun moves the LLM to authoring time. Once.
# Authoring: LLM inspects once, emits deterministic code $ tap forge https://reddit.com/r/programming ✓ Inspected: REST API detected at oauth.reddit.com ✓ Verified: 25 rows, score 95/100 ✓ Saved: reddit/hot.tap.js (pure JavaScript, on your disk) # Runtime: no LLM, no tokens, same output every time $ tap reddit hot # 25 rows, ~200 ms, $0.00 $ tap reddit hot # 25 rows, ~200 ms, $0.00 $ tap reddit hot # 25 rows, ~200 ms, $0.00
Because the output is deterministic, monitoring is tractable. Because execution is deterministic, row count is a health signal. Because the program is on your disk, it works offline and doesn’t depend on anyone’s cloud.
And the “self-healing” axis flips from reactive to proactive:
$ tap doctor --auto reddit hot ✗ selector div.thing — gone since last run ⚠ fingerprint diff: ↑ 2 structural changes ✓ heal bundle ready — current code + git history + page snapshot
tap doctor checks a structural fingerprint before the run fires. If the site drifted, the run doesn’t even start — you get a diff of what changed and a bundle your AI agent can patch offline. No retry tokens. No silent bad data.
| rtrvr.ai | Taprun | |
|---|---|---|
| Model | Interpreter (LLM per task, DOM-compressed) | Compiler (LLM at forge time) |
| LLM calls per run | Every task | 0 (after first compile) |
| Tokens per run | ~26K (claimed) | 0 |
| Cost at 1,000 runs/day | ≥ Scale tier ($499.99/mo) or BYOK bill | $9/mo flat |
| Output consistency | 81% SOTA (stated benchmark) | 100% deterministic |
| Break detection | Reactive (heal after failure) | Proactive (fingerprint before run) |
| Offline execution | No (needs LLM) | Yes (pure JS) |
| Program ownership | Tasks saved to rtrvr | .tap.js on your disk |
| Form factors | Chrome ext, Cloud, API, MCP, CLI, WhatsApp, embeddable | CLI, MCP, Chrome ext, JSR executor |
| Pre-built skills | Task library (rtrvr-authored) | 140+ taps across 68+ sites (on your disk) |
Take a workflow that runs every 5 minutes — 288 runs/day, ~8,640 runs/month. Not extreme; this is a single production scraper.
At 10 runs a day, none of this matters. At 10 runs a minute, it’s the only thing that matters.
Pick rtrvr.ai when:
Pick Taprun when:
They’re not really competitors — they’re different tools for different moments. Use rtrvr to figure out what you want to extract. Use Taprun once you know.
rtrvr made LLM-at-runtime 25× cheaper than the vision-based baseline. Taprun made it zero. Those aren’t points on the same line.
$ npx -y @taprun/cli --version $ tap forge https://news.ycombinator.com # one-time LLM pass $ tap hackernews hot # $0 per run, forever