What 44 capture traces taught us about SPA shells

May 22, 2026 · Leon Ting · 8 min read · Diagnosing the silent 34% retry-loop tax on first-time captures, and the two ADRs that fixed it

Last week we did a boring audit. ls ~/.tap/traces/ | wc -l on a developer machine: 44 trace files. jq over them, group by source URL, count repetitions. The result was uncomfortable: 15 of the 44 traces were retry clusters on the same URL — three, four, sometimes five attempts each, all bare-fetch, all returning 200 OK with bodies that differed by milliseconds in the timestamps and nothing else.

Three sites dominated: xiaohongshu.com, console.aliyun.com, jike.city. Different products, different teams, same architectural answer to "what is your homepage?" — a Vite-built single-page-app shell that serves a 200 OK with about 800 bytes of HTML, then asks the browser to fetch the real content over XHR after the JS bundle boots.

For a real user this is invisible: Chrome renders the shell, runs the JS, fetches the data, paints. For a bare fetch with no JS runtime, the shell is the response. And our forge — Tap's URL-and-intent-to-plan compiler — was reading the 200 status, finding none of its expected source-class signatures in the body (no JSON, no RSS, no OpenGraph card, no obvious list rendering), and shrugging back a plan: null. The user retried. The forge tried again. Same shell, same shrug. The trace directory grew.

None of this was new. SPA shells have been a known scraping landmine for years — every other thread on r/webscraping includes someone explaining for the hundredth time that "you need a headless browser for this site, not requests." What was new was that we now had structured trace data on disk to measure the problem, and the measurement said it was the single biggest source of first-capture failures across the corpus.

Why the trace files alone weren't enough

Tap persists every capture to ~/.tap/traces/<ts>-<urlhash8>.trace.json. Per ADR 2026-05-15-capture-trace-persistence, the schema records: the URL, the request method, the dispatch path (via: "bare" for direct fetch or "extension" for the authenticated browser peer), and a fixed-cap body preview (1 MiB max, byte-exact). That's enough to re-read what arrived, which is half of what you need when debugging a failure.

It is not enough to read what forge concluded. The forge pipeline runs the bytes through a classifier that emits a source_class — one of nine closed values: json-api, rss, atom, json-ld, opengraph, html-list, spa-rendered, auth_redirect, unknown — plus a reason string explaining why. That decision lived in memory during the capture call and surfaced once in the MCP envelope, then evaporated. The trace file on disk had the input; it didn't have the verdict.

So when we sat down to diagnose the retry clusters, we could see the bytes (<!doctype html><html><head>...<script type="module" src="/assets/index-DQqWfQqj.js"></script></head><body><div id="app"></div></body></html> on every retry — Vite's signature). We could not, from the trace alone, see why forge had concluded the body was unknown rather than spa-rendered. The information existed at decision time; we had thrown it away.

Fix #1: persist the verdict alongside the bytes

ADR 2026-05-22-augment-capture-trace-with-forge-decision ships an additive change to the CaptureTrace interface: a required inspection field with the same shape as the projection the MCP envelope already exposes — { source_class, reason, detected_signals? }. Every code path that writes a trace now reads the same InspectionResult the forge classifier just computed and persists it. Same source of truth, two destinations.

The architecture test that locks this is short: open every trace file, assert .inspection.source_class is one of the nine closed values and .inspection.reason is a non-empty string. No null, no missing. A future refactor that drops the field fails CI before it merges, regardless of whether anyone remembered to add the field to the schema doc.

This is half of what we needed. The next capture against an SPA shell now writes a trace that says, in plain text, "source_class": "spa-rendered", "reason": "vite asset bundle reference + empty #app div + no other content". A human reading the trace knows what happened in one glance. So does the agent on the next call.

Fix #2: a per-domain fingerprint cache

Knowing what went wrong on attempt one doesn't help attempt two if attempt two has no memory of attempt one. Bare-fetch is cheap and stateless and that's exactly its problem: forge couldn't remember that xiaohongshu.com had served it a Vite shell five minutes ago, so it tried bare-fetch again.

ADR 2026-05-22-substrate-fingerprint-cache introduces ~/.tap/fingerprints/<hostname>.json — one file per registrable host, atomic write, 30-day mtime TTL. The schema records what worked on this domain and what didn't:

{
  "domain": "xiaohongshu.com",
  "first_seen_at": "2026-05-22T01:14:33Z",
  "last_seen_at": "2026-05-22T08:42:11Z",
  "captures": 7,
  "source_class_distribution": {
    "spa-rendered": 6,
    "unknown": 1
  },
  "last_via": "extension"
}

The capture pipeline reads this file before deciding how to fetch. If a domain has prior spa-rendered evidence at any reasonable frequency, the forge pre-escalates directly to the authenticated extension peer — skipping bare-fetch entirely on the second-plus visit. The first visit still pays the bare-fetch cost (we need to learn what kind of site this is), but the cost is now amortized: one wasted retry per domain, ever, not one per capture call.

Two architectural details worth calling out, because they're load-bearing:

One file per host, not a single index. Concurrency on a developer laptop is racy — two parallel tap capture calls against different domains shouldn't fight for the same JSON file. Per-host files mean each lock is contention-free.
30-day TTL, not infinite. A site that was SPA in May might be statically rendered in November after an architectural rewrite. The fingerprint should decay; the cache enumerator excludes files older than 30 days from its read path, so a stale entry self-evicts rather than rotting forever.

The arch test for this is structural: open ~/.tap/fingerprints/, assert filename pattern is exactly <hostname>.json, assert every record has the required fields, assert no file is older than 30 days. Cache divergence becomes a build break, not a Slack thread six weeks later.

Five commits, one week

The full shipped sequence, oldest first:

de45b30 — feat(forge): spa-rendered source_class + nav+wait+eval template (#58). Add the ninth source_class value plus the template forge emits when it recognizes one. Without this, "we know it's an SPA" doesn't help — there's still no plan to ship.
0ea398c — fix(forge): detect Vite-style empty SPA shells (Phase 7 from #58). The classifier specifically. Vite's signature is fairly tight: <script type="module" src="/assets/index-<hash>.js"> + an empty #app or #root div + no JSON-LD or OpenGraph cards. Detecting this also catches Vue 3, React 18 + Vite, Svelte 5, and Solid — they all bundle through Vite by default now.
1dfde37 — feat(forge): augment capture trace with forge decision rationale. The disk-persistence fix from ADR (augment-capture-trace).
33756b0 — feat(forge): per-domain fingerprint cache (slice 1 — write-only). Cache the data; nothing reads it yet. (Slicing it write-first means we can verify the disk shape independently of the read consumer.)
348d62c — feat(forge): fingerprint cache READ side + extension pre-escalation. The read consumer that closes the loop.

One follow-up worth calling out, because it caught a real bug:

01f1f24 — feat(forge): capture-and-verify atomic — smoke-run plans before savePlan. After the fingerprint cache started routing more captures through the extension peer, we noticed forge was happily saving plans that the runtime would reject on first execution (drift between forge's expectation of substrate behavior and the actual peer response). The fix: before savePlan, the forge runs the plan's observe phase end-to-end. If runtime says "this won't work," the plan never lands on disk. Catches the first drift, not the second.

749/749 tests pass strict typecheck. The audit on a fresh developer machine three days post-ship shows zero retry clusters across the same three domains. The 34% number is a one-time scar, not a recurring one.

What's actually generalizable

The specific fix — augment trace, add fingerprint cache — is Tap-internal. The shape of the bug is not.

Three patterns kept showing up across the audit and the post-mortem discussions:

Persist verdicts, not just inputs. Logs that capture "what arrived" without also capturing "what we decided about it" make every debugging session start from byte-level. The cost of writing the verdict alongside the input is one extra field; the savings are felt every time a future maintainer reads a trace from a system they don't remember.
Cache the cheapest signal that breaks ambiguity. "Was this site SPA last time?" is one bit per domain. Caching it is a kilobyte per host. The alternative — re-running the bare-fetch experiment every call — is dollars per thousand requests across a corpus. The asymmetry is large enough that the cache shouldn't be an optimization, it should be the default.
Decay caches by default. A 30-day TTL is not about freshness in the index sense; it's about not building a system whose memory of failure modes outlives the failure modes themselves. Sites refactor. If your cache doesn't, every refactor becomes a silent re-introduction of the bug you fixed in 2024.

Pattern 1 is the cheap one and the one most teams skip. The relevant question to ask of any logging system is not "what fields does it write?" but "if the worst bug we fixed last quarter happened again, would the new logs let me find the root cause without re-deriving it?" If the answer involves looking at code, the log is missing the verdict.

If you're hitting the same wall

Tap is MIT (GitHub) and runs as a Chrome extension plus CLI plus MCP server. Install path:

brew install LeonTing1010/tap/taprun
# or
npx @taprun/spec init

For sites in the SPA-shell shape — xiaohongshu, jike, aliyun-console, douyin, many Vue/React SaaS dashboards — tap capture now writes a working plan on the first call, no retry loop. The trace file on disk carries the verdict if you ever need to debug. The per-domain fingerprint at ~/.tap/fingerprints/<host>.json is a few kilobytes you can cat any time to see what Tap remembers about that site.

None of this is novel as a scraping technique. What's novel is making the verdict and the cache visible — disk-persisted, schema-checked, deletable by hand if you want to start over. Browser automation has historically been a black box of "it worked yesterday, why is it failing today?" The boring fix is to stop building black boxes.

The next time the audit runs and the retry-cluster count goes back up, we'll know within minutes which domain changed shape — because the verdict is on disk, and the verdict has a date.

Related reading on taprun.dev:

Scraping behind login walls — why authenticated browser sessions beat OTP/CAPTCHA bypass
Compile once, diff the drift — why deterministic plans beat per-call LLM calls
Health contracts catch silent failures — the per-op expect CEL predicate

Taprun: your agent runs the browser task — you keep the audit trail

Tell your agent a browser task on any site that needs your login — it runs in your real, already-logged-in Chrome and compiles it once into a deterministic, auditable .plan.json program: a versioned, reviewable record of exactly what it did. Every replay after is local, zero tokens, same result every time. Cookies and sessions never leave your machine — by architecture, not policy. Cloud browser SDKs can't match this; they need your session in their database to function. tap verify catches substrate drift before your data goes stale. Works with Claude Code, Cursor, Cline, Windsurf, and any MCP host. 70+ community taps.

curl -fsSL https://taprun.dev/install.sh | sh

taprun.dev · GitHub · More posts

Follow new engineering notes: RSS · Watch on GitHub