← Tap · Blog

Compile-time AI is safe. Runtime AI deleted a production database.

April 27, 2026 · Leon Ting · 7 min read

The top of Hacker News today, 600+ points and climbing: “An AI agent deleted our production database. The agent's confession is below.”

The reactions split predictably. Half are better prompts would have prevented this. Half are never give an agent write access. Both miss the actual lever.

The lever is where in the lifecycle the AI is allowed to think. There are exactly two choices, and one of them is the one that just dropped a table.

Two places to put the AI

Every AI-driven automation puts the model in one of two positions:

Compile-time AIRuntime AI
When the model thinksOnce, while you write the programOn every call, forever
What it producesA program a human can readAn action the system executes
Review surfaceA diff, before anything runsLogs, after something happened
Op space at executionFrozen at compile timeWhatever the model decides next
Token cost per callZeroLinear in tasks
Prompt-injection surfaceThe author's IDEEvery page the agent visits

Both architectures are real, both ship today, and the difference is invisible from the outside — until something gets dropped.

What runtime AI actually means in production

When an AI agent has runtime authority over your systems, the surface looks innocent on day one:

# “Help me clean up the staging environment”
agent run --db production_creds.json --task "clean staging"

The credentials are real. The op space is whatever the agent's tool list contains — usually execute_sql, http_request, shell, with no constraint on what arguments those tools receive. The model decides, on each call, which tool to invoke and with which arguments. The decision is influenced by every token of context that scrolled past it — including, notably, anything injected by a hostile page or a hallucinated screenshot caption.

The HN incident isn't surprising in this architecture. It's the expected failure mode. Give a stochastic decision-maker production credentials and an unbounded action space, and somewhere in the long tail it will choose DROP TABLE for a plausible-looking reason. The next iteration of the model won't fix this; it just shifts the failure to a less common prompt.

What compile-time AI looks like instead

Compile-time AI inverts the timing. The model writes a program once, the program is reviewed by a human, and then the program runs forever without the model. The model is not in the loop at runtime; only the program it wrote is.

Concretely, the program has three properties the agent loop doesn't:

  1. The op set is closed before runtime starts. A program that says “read these three URLs and project these four fields” cannot, at execution time, decide to issue a DELETE. The op vocabulary was fixed when the program was written.
  2. Side-effecting intent is declared on the envelope. A program that intends to write is labeled write, runs only when explicitly authorized, and is excluded from automatic re-runs. A program that intends to read is labeled read; reading cannot become writing later.
  3. Drift is detected before action, not after. When the site or schema changes, the program's structural fingerprint stops matching before the wrong data flows downstream. The check happens on a known-good baseline, not on an LLM's interpretation of what the page looks like today.

None of this requires the model to be smarter, more aligned, or better-prompted. It requires the model to be elsewhere — specifically, not in the runtime path.

This is the architecture Tap is built on

Tap's plan format is a closed 11-op union. The full vocabulary is fixed; a plan cannot, at execution time, introduce an op the runtime doesn't know. The plan runtime is mechanically prevented from loading new code paths — a static guard rejects new Worker, new Function, eval, and dynamic .js imports inside the plan-runtime module. The model that wrote the plan cannot influence what runs.

Read and write are unrepresentable as the wrong shape. The Plan TypeScript type is a discriminated union: the read variant has act?: never; key?: never; the write variant requires both. A read tap cannot quietly become a write tap; that's a different shape, fails compile, fails lint.

And drift detection runs before data flows. tap verify takes a Snapshot and applies the per-tap CEL snapshot_equivalent predicate against the prior baseline. If the predicate returns false, the verdict is drifted and the plan stops returning data, instead of returning wrong data. The verification arm sees what the plan was supposed to read, not what an LLM thinks the page now says.

What this architecture cannot prevent

Honesty about the boundary, because it matters:

The architecture is doing one job: it's making the difference between “an LLM might do something destructive” and “a reviewable program does exactly what was reviewed.”

Why this didn't already happen

The reason runtime-AI architectures dominate the current generation of agent tools is not that they're better — it's that they're the obvious shape when you start from “how do we let the LLM use tools?”

If your starting question is “the LLM has tools, the LLM picks one each turn”, you end up with runtime AI by default. The model is in the loop because the framing put it there.

If your starting question is “what's the smallest deterministic program that does this task, and where does the AI need to participate?”, you usually find the answer is once, at the beginning, supervised. Then the program runs without the model, and the model isn't around to pick the wrong tool.

The cost is that you have to define the op space ahead of time. Tap's bet is that for browser automation specifically, 11 ops is enough — the closed v2 union (7 substrate ops + 3 control flow + 1 typed-eval escape) covers the patterns we've shipped against, and a 12th op would be a governance event, not runtime improvisation.

The ask

If you've been running agents with production credentials and an unbounded tool list, the next time you read a thread like the HN one, the question to ask isn't “what prompt would have stopped this?” The question is “why was the model in the loop at runtime in the first place?”

Sometimes the answer is good — the task genuinely requires per-call decisions a fixed program can't make. Most of the time, the answer is that nobody asked.

Try compile-time AI for browser work:

npm install -g @taprun/cli@latest
# capture a Plan once, run forever:
tap capture https://your-site.com your-site/your-task --intent "..."
tap your-site/your-task