Run the same browser automation twice with Browser Use or Stagehand. You'll get different results. Not sometimes — every time. The LLM re-interprets the page on each run, and interpretation is inherently non-deterministic.
This isn't a bug. It's the fundamental limit of the interpreter model. And it's why the industry is moving toward a compiler model for repeatable automation.
AI browser agents work by sending the page's DOM to an LLM and asking "what should I click?" The LLM responds differently each time — temperature, context window sizing, minor DOM changes all affect the answer.
The practical consequences:
The reliability floor for AI browser agents is 60–95%. That sounds fine until you're running 100 tasks/day and 5–40 of them return garbage.
Deterministic automation compiles AI understanding into a program once, then runs that program forever:
# Step 1: AI inspects the website (one-time) $ tap capture https://github.com/trending github/trending --intent "trending repos" ✓ Deterministic template hit — API endpoint detected, no AI tokens ✓ Lint passed ✓ Saved: ~/.tap/plans/github/trending.plan.json # Step 2: Run forever, same result every time $ tap github/trending # Day 1: 25 rows $ tap github/trending # Day 2: 25 rows $ tap github/trending # Day 365: 25 rows
Same input → same output. Every time. The program doesn't call the LLM. It doesn't reinterpret the page. It executes the same code path deterministically.
The analogy is exact:
| Software | Browser Automation | |
|---|---|---|
| Interpreter | Python, Ruby | Browser Use, Stagehand |
| Compiler | GCC, rustc | Tap |
| Source | Python source code | Website at a point in time |
| Output | Machine code (fast, deterministic) | .plan.json bare Plan (fast, deterministic) |
| Runtime cost | $0 per execution | $0 per execution |
Interpreters are flexible — you can change behavior at runtime. Compilers are fast and reliable — the same binary always produces the same output. For production automation, you want the compiler.
This is the obvious objection: "deterministic programs break when the page changes." Yes — and that's actually a feature.
When a website changes:
tap verify <site>/<name> returns verdict: drifted; the per-tap CEL snapshot_equivalent predicate names exactly what stopped matchingtap capture <url> <site>/<name> against the same site+name overwrites the plan; the next verify rebaselines after human reviewCompare this to AI browser agents: they appear to handle changes because they re-interpret the page each time. But they handle changes inconsistently — sometimes correctly, sometimes not, with no way to tell which.
Deterministic plans either work or they don't — and tap verify tells you which. Non-deterministic agents silently fail for days before anyone notices.
Deterministic automation isn't for everything. You still want AI at runtime when:
Tap doesn't replace AI at runtime for these cases — it compiles AI's understanding into programs for the cases where you can eliminate the AI. The more you compile, the less you pay. The more you compile, the more reliable your automation becomes.
| Metric | AI (Interpreter) | Deterministic (Compiler) |
|---|---|---|
| Cost per run | $0.50–$2.00 | $0 |
| Consistency | 60–95% | 100% |
| Execution time | Seconds to minutes | <1s |
| Breakage detection | None | Within the hour |
| Scalability | Linear token cost | Zero marginal cost |
For any task you run more than once, the compiler model wins on every dimension.
$ npx -y @taprun/cli --version $ tap capture https://github.com/trending github/trending --intent "trending repos" # Compile once $ tap github/trending # Run forever at $0