The top of Hacker News today, 600+ points and climbing: “An AI agent deleted our production database. The agent's confession is below.”
The reactions split predictably. Half are better prompts would have prevented this. Half are never give an agent write access. Both miss the actual lever.
The lever is where in the lifecycle the AI is allowed to think. There are exactly two choices, and one of them is the one that just dropped a table.
Every AI-driven automation puts the model in one of two positions:
| Compile-time AI | Runtime AI | |
|---|---|---|
| When the model thinks | Once, while you write the program | On every call, forever |
| What it produces | A program a human can read | An action the system executes |
| Review surface | A diff, before anything runs | Logs, after something happened |
| Op space at execution | Frozen at compile time | Whatever the model decides next |
| Token cost per call | Zero | Linear in tasks |
| Prompt-injection surface | The author's IDE | Every page the agent visits |
Both architectures are real, both ship today, and the difference is invisible from the outside — until something gets dropped.
When an AI agent has runtime authority over your systems, the surface looks innocent on day one:
# “Help me clean up the staging environment” agent run --db production_creds.json --task "clean staging"
The credentials are real. The op space is whatever the agent's tool list contains — usually execute_sql, http_request, shell, with no constraint on what arguments those tools receive. The model decides, on each call, which tool to invoke and with which arguments. The decision is influenced by every token of context that scrolled past it — including, notably, anything injected by a hostile page or a hallucinated screenshot caption.
The HN incident isn't surprising in this architecture. It's the expected failure mode. Give a stochastic decision-maker production credentials and an unbounded action space, and somewhere in the long tail it will choose DROP TABLE for a plausible-looking reason. The next iteration of the model won't fix this; it just shifts the failure to a less common prompt.
Compile-time AI inverts the timing. The model writes a program once, the program is reviewed by a human, and then the program runs forever without the model. The model is not in the loop at runtime; only the program it wrote is.
Concretely, the program has three properties the agent loop doesn't:
write, runs only when explicitly authorized, and is excluded from automatic re-runs. A program that intends to read is labeled read; reading cannot become writing later.None of this requires the model to be smarter, more aligned, or better-prompted. It requires the model to be elsewhere — specifically, not in the runtime path.
Tap's plan format is a closed 11-op union. The full vocabulary is fixed; a plan cannot, at execution time, introduce an op the runtime doesn't know. The plan runtime is mechanically prevented from loading new code paths — a static guard rejects new Worker, new Function, eval, and dynamic .js imports inside the plan-runtime module. The model that wrote the plan cannot influence what runs.
Read and write are unrepresentable as the wrong shape. The Plan TypeScript type is a discriminated union: the read variant has act?: never; key?: never; the write variant requires both. A read tap cannot quietly become a write tap; that's a different shape, fails compile, fails lint.
And drift detection runs before data flows. tap verify takes a Snapshot and applies the per-tap CEL snapshot_equivalent predicate against the prior baseline. If the predicate returns false, the verdict is drifted and the plan stops returning data, instead of returning wrong data. The verification arm sees what the plan was supposed to read, not what an LLM thinks the page now says.
Honesty about the boundary, because it matters:
DELETE /admin/users/all. That tap is still reviewable as a diff before it ever runs — but if a reviewer approves it, it runs.The architecture is doing one job: it's making the difference between “an LLM might do something destructive” and “a reviewable program does exactly what was reviewed.”
The reason runtime-AI architectures dominate the current generation of agent tools is not that they're better — it's that they're the obvious shape when you start from “how do we let the LLM use tools?”
If your starting question is “the LLM has tools, the LLM picks one each turn”, you end up with runtime AI by default. The model is in the loop because the framing put it there.
If your starting question is “what's the smallest deterministic program that does this task, and where does the AI need to participate?”, you usually find the answer is once, at the beginning, supervised. Then the program runs without the model, and the model isn't around to pick the wrong tool.
The cost is that you have to define the op space ahead of time. Tap's bet is that for browser automation specifically, 11 ops is enough — the closed v2 union (7 substrate ops + 3 control flow + 1 typed-eval escape) covers the patterns we've shipped against, and a 12th op would be a governance event, not runtime improvisation.
If you've been running agents with production credentials and an unbounded tool list, the next time you read a thread like the HN one, the question to ask isn't “what prompt would have stopped this?” The question is “why was the model in the loop at runtime in the first place?”
Sometimes the answer is good — the task genuinely requires per-call decisions a fixed program can't make. Most of the time, the answer is that nobody asked.
npm install -g @taprun/cli@latest # capture a Plan once, run forever: tap capture https://your-site.com your-site/your-task --intent "..." tap your-site/your-task