Compile your browser-use AgentHistory into a deterministic replay
After your browser-use agent succeeds once, hand its
AgentHistory to taprun.forge(). Tap compiles the
navigation skeleton into a .tap.json plan that replays at
zero LLM tokens — forever, until the page actually
changes.
Why this exists
Every time your agent runs the same flow, it pays the LLM to re-derive the same DOM. The model doesn't remember last week's click sequence; it re-decides each step from scratch. For a multi-step authenticated flow that's the same shape on every run, that's wasted tokens.
The compile-once / run-forever pattern is older than LLM agents. Tap just makes it the default for browser flows.
What v1 compiles deterministically
Be precise about scope so you know exactly what you're paying for and what you're not.
Navigation skeleton — always 0 LLM tokens on replay
go_to_url — open a URLclick_element — click by selectorinput_text — fill a fieldscroll — scroll viewportwait — pause for a conditiondone — terminal stepExtraction — depends on the destination
extract_content from the AgentHistory itself: browser-use's transcript doesn't capture per-step DOM, so semantic extraction can't be replayed deterministically from the trajectory alone. Run tap forge <url> on the destination to compile the extraction half.Honest framing: navigation is the easy half; it's always free. Extraction is free only at Tier 0 sites. For a page without a Tier 0 source, the extraction step still costs LLM tokens; only the navigation skeleton saves. Most multi-step flows are dominated by navigation cost, so the saving is real even without Tier 0 — but don't believe a "$0/run" headline that doesn't name the split.
Token math, concretely
Numbers below are typical for a multi-step browser-use flow against a Tier 0 destination (e.g. a dashboard with a JSON API behind it). Your mileage varies; the directional shape doesn't.
| Run | Approach | LLM tokens | Approx cost |
|---|---|---|---|
| 1st | browser-use full LLM loop | ~14,000 | $0.042 |
| 1st (compile) | forge(trajectory=...) | ~1,100 | $0.003 |
| 2nd → ∞ | browser-use re-runs | ~14,000 each | $0.042 each |
| 2nd → ∞ | tap run site/name | 0 | $0.000 |
100 reruns: browser-use path ≈ $4.20;
Tap path ≈ $0.003. The crossover happens at run 2.
Token figures assume Claude Sonnet pricing as of 2026-04; substitute
your provider's rate if different.
Quickstart — Python
pip install taprun
Run your browser-use agent end-to-end at least once to get a working trajectory. Then:
from browser_use import Agent
from taprun import forge, run, doctor
# 1. Standard browser-use run — pays full LLM cost once.
agent = Agent(task="apply to Y Combinator", llm=...)
result = await agent.run()
# 2. Compile the navigation skeleton.
forge(
trajectory=result.model_dump(), # AgentHistoryList dict, JSON string, or path
site="ycombinator",
name="apply",
)
# writes ~/.tap/taps/ycombinator/apply.tap.json
# 3. Replay forever at 0 LLM tokens for navigation.
rows = run("ycombinator/apply")
# 4. Optional — drift detection without re-running the agent.
verdict = doctor("ycombinator/apply") # 'ok' | 'broken' | 'stale'
Quickstart — TypeScript
npm install taprun
import { forge, run, doctor } from "taprun";
// after your browser-use TS agent finishes:
await forge({
trajectory: agent.history, // AgentHistoryList object, JSON string, or path
site: "ycombinator",
name: "apply",
});
const rows = await run("ycombinator/apply");
const verdict = await doctor("ycombinator/apply");
The SDK is a thin subprocess wrapper around the tap CLI
binary. Each call spawns a process (~30–100 ms warm). For tight loops,
run tap mcp start and call via MCP — the runtime stays
resident and there's no per-call spawn cost.
Compiling the extraction half (Tier 0)
If your trajectory ends on a page with a stable data shape exposed via a Layer 1 source — Atom feed, JSON API, JSON-LD product page, agents.json manifest, OpenAPI doc — you can compile that half too:
# In a separate step, after the trajectory is forged:
tap forge https://example.com/dashboard
# inspects the destination, finds a Layer 1 source, emits a deterministic
# extraction at 0 LLM tokens. Saves to ~/.tap/taps/example/dashboard-data.tap.json
Compose the two with tap pipe or chain them in your code.
For pages without a Layer 1 source, extraction falls back to the AI
loop and stays priced; only navigation saves.
Drift detection — the part runtime LLMs can't do
A self-replaying agent doesn't know if the page silently changed
underneath it. doctor independently fetches an
authoritative source for the page (the same Tier 0 source the
extraction uses, or a fingerprint baseline) and diffs against the
plan's expected output. Three verdicts:
ok— page shape matches the baseline; replay is trustworthy.stale— output is still parseable but shape drifted (new column, renamed field, etc.). Runtap refreshto rebaseline.broken— replay would produce wrong data. Heal the tap (Pro tier replays a cached patch at 0 tokens; misses fall back to a minimal-patch LLM call).
This is the layer that distinguishes Tap from a one-shot replay cache: the diff is computed against an authoritative source, not against the plan's own past output.
FAQ
Does this work for write actions (publishing, comments, transfers)?
Yes — forge respects browser-use's intent declarations.
Write taps run under tap.run like reads, but they're
skipped under tap.doctor (verifying a write would have
side effects). Drift in write taps is caught by health-check failures
on the read side instead.
What about Stagehand and Anthropic computer-use trajectories?
The trajectory schema in v1 is the browser-use
AgentHistoryList shape. Stagehand and raw Anthropic
tool-use trajectories are on the roadmap; the navigation primitives
overlap, so adding them is mostly a schema mapping. Track
issues for
progress, or open one to vote.
Is the SDK open-source?
The taprun SDK packages on npm and PyPI are MIT-licensed
thin clients. The tap CLI binary they invoke is
proprietary. The Chrome extension runtime
(github.com/LeonTing1010/tap)
and the 65+ community taps in
tap-skills
are MIT.
What does it cost?
Free tier covers the deterministic compile path
(Layer 1 sources), run, doctor, and 65+
community taps. Hacker $9/mo unlocks the full forge
pipeline including the AI fallback for Layer 4 (DOM-only) pages, BYOK.
Pro $29/mo adds heal (cached patches
replay at 0 tokens), refresh, scheduling, and team
server. 100% local mode is the default at every tier.
Full pricing.
What if my page has no Tier 0 source?
Run tap forge <url> anyway. The forge pipeline
falls back through Layer 2 (SSR) → Layer 3 (XHR/fetch) → Layer 4
(DOM). The deeper the layer, the more LLM tokens the compile costs
one time, but replay is still deterministic. Layer 4 compile costs
real tokens; Layers 1–3 are mostly free.
How do I know my replay didn't silently break?
Schedule tap doctor on a daily cron, or call it inline
before consuming run() output in production. Doctor
cross-validates against an authoritative source — it catches drift
without you noticing or your downstream consumer noticing.
Next
Last updated 2026-04-26 against
taprun v0.14 (Python) / v0.1.x (TypeScript). Trajectory
compile (forge(trajectory=...)) shipped in commit
e46c9df; the API is alpha and stabilizes at 0.2 once
integration feedback lands.