← Tap · Blog

How We Turned Tap's Engine Into a 0.8 ms Library (and Measured a 294× Speedup)

April 11, 2026 · Leon Ting · 9 min read · From proprietary binary to embeddable package in one session

Three blog posts from earlier today tell the story of Tap growing a composition layer: why we built tap.pipe() as JavaScript instead of YAML, how Reddit Demand Kit compiled a $0.15 LLM call into a $0 deterministic pipeline, and what we learned building the Chrome extension that powers it all. This post is what happened three hours after the RDK refactor shipped, when one architectural question wouldn't leave me alone.

The question was: why does RDK need to fork a subprocess to run a pipeline?

The thing that kept bugging me

When RDK runs its market-scan pipe, the flow is:

  1. User says rdk compile market-scan --subreddit sysadmin
  2. RDK's compile.ts spawns tap rdk market-scan --json --subreddit sysadmin as a subprocess
  3. The tap binary starts a fresh Deno runtime, loads the .tap.js file, calls runTap(), which invokes handle.pipe(), which schedules the DAG and runs each sub-tap
  4. Output JSON comes back to RDK via stdout

Step 2 is the problem. A subprocess fork on my M4 Mac takes about 80 milliseconds. Starting a Deno runtime adds another 50-100ms for module loading and JIT compilation. That's 130-180ms of pure overhead every single time RDK wants to run any pipeline, even one that's pure in-memory data transformations with no network calls at all.

For a user running rdk compile interactively, 200ms is imperceptible. For a cron job watching for demand signals every minute, it's 288 wasted seconds per day. For a product that runs a pipeline in a tight loop — say, "score every Reddit post from the last 24 hours against a custom rubric" — it's the difference between "runs in background while I drink coffee" and "runs before I finish reading the next paragraph."

More importantly, it's structurally wrong. The pipeline engine is pure data. It's 200 lines of Deno TypeScript that does DAG topological sort, $ref binding resolution, parallel scheduling, and a run-scoped cache. It has zero external dependencies. It doesn't need a browser. It doesn't need a daemon. It doesn't need a compiled binary. It's a library function that Tap was forcing to cosplay as a CLI tool.

The extraction

The goal became clear: extract the composition engine into a standalone package that any Deno (or, eventually, Node) product could import directly. Let RDK — and every future product — skip the subprocess entirely and run pipelines inside its own process.

Tap's repo already had most of the right structure. The composition layer was split across four files that had zero imports from tap-core's other modules:

Total: 43 KB of TypeScript. Zero imports from forge.ts, daemon.ts, bridge.ts, or any of the runtime-specific code. The separation already existed latently; it just wasn't declared as a package.

The actual extraction took about 15 minutes:

mkdir -p packages/executor/src packages/executor/test
git mv src/executor.ts packages/executor/src/executor.ts
git mv src/page.ts     packages/executor/src/page.ts
git mv src/pipe.ts     packages/executor/src/pipe.ts
git mv src/sandbox.ts  packages/executor/src/sandbox.ts

The four files kept all their internal cross-imports (executor.ts imports from ./pipe.ts and ./page.ts, etc.) because they're still in the same directory. Nothing had to be rewritten.

The re-export shim trick

The awkward part of the extraction was the nineteen other files in tap-core/src/ that imported from ./executor.ts or ./pipe.ts. Updating every import to point at ../packages/executor/src/... would touch cli.ts, mcp.ts, forge.ts, doctor.ts, the three runtime files, and more. A 19-file blast radius is how you introduce accidental regressions in a tired late-night refactor.

Instead, each moved file got replaced in src/ by a one-line re-export shim:

// src/executor.ts
export * from "../packages/executor/src/executor.ts";

Every existing consumer keeps its import { runTap } from "./executor.ts" path. The shim re-exports everything from the canonical source. Zero import changes across the other nineteen files. Full backwards compatibility, single source of truth — the package is canonical, the shim is a thin pointer.

This is a pattern every extraction refactor should use: move the files, but leave a re-export breadcrumb at the old path. It turns a 20-file refactor into a 4-file refactor.

The tests that broke (and what they taught us)

Running the full test suite after the move surfaced exactly six failures, all from the same class of test: ones that read source files by hardcoded path to do structural analysis. For example:

// src/test/tap_dts_test.ts — Pipe fields drift detector
const pipeSrc = await Deno.readTextFile(`${ROOT}/src/pipe.ts`);
const dtsFields = extractPipeFields(dts);
const srcFields = extractPipeFields(pipeSrc);
assertEquals(dtsFields, srcFields, "Pipe field mismatch");

After the move, src/pipe.ts is a 1-line re-export shim containing zero interface definitions. The regex field extractor returned an empty array. The drift check compared the d.ts against an empty array and screamed.

The fix was mechanical — point the test at the new canonical location — but the lesson is more interesting. Tests that read source code by path are implicit architectural assertions. Every such test is quietly stating "I know where this code lives." The extraction broke those assertions. Fixing them required making the new location explicit in each test:

const EXECUTOR_SRC = new URL("../../packages/executor/src", import.meta.url).pathname;

Six tests needed this update. One semantic test — the vocabulary checker that walks every source file looking for forbidden terms like "kernel" or "stdlib" — needed a bigger change: its readSources() helper only scanned tap-core/src/. After the move, it would silently stop covering the executor files. I updated it to scan both directories:

async function readSources() {
  const files = [];
  for await (const entry of Deno.readDir(SRC_DIR)) { ... }
  for await (const entry of Deno.readDir(EXECUTOR_SRC)) { ... }  // NEW
  return files;
}

Without this, semantic constraints like "handle type is Tap, not Page" would have silently stopped checking the files where that type actually lives. Structural tests that move with the code are more robust than structural tests that hardcode paths — but path-hardcoded tests are fine as long as you remember to update them in the same commit as the move.

After the six fixes, 621 of 621 tap-core tests pass. Zero regression.

The smoke test that proves the package is real

An extracted package is only useful if it's actually self-contained. A subtle failure mode would be leaving a sneaky import in one of the four moved files that still reached back into tap-core/src/ via a relative path. If that happened, publishing would succeed but importing from outside the tap-core tree would fail.

I wrote a dedicated boundary test for this:

Deno.test("[smoke/boundary] package imports resolve within packages/executor/", async () => {
  // mod.ts must not reference tap-core/src/ anywhere
  const modContent = await Deno.readTextFile(new URL("../mod.ts", import.meta.url).pathname);
  const escapePaths = modContent.match(/\.\.\/\.\.\/src\/|tap-core\/src\//g);
  assertEquals(escapePaths, null, "mod.ts must not import from tap-core/src/");

  // Every source file must import only from sibling files
  for (const file of ["executor.ts", "pipe.ts", "page.ts", "sandbox.ts"]) {
    const src = await Deno.readTextFile(new URL(`../src/${file}`, import.meta.url).pathname);
    const externalImport = src.match(/from ['"]\.\.\/\.\.\//g);
    assertEquals(externalImport, null, `${file} must not import outside the package`);
  }
});

The test passes by regex-searching for the exact shapes of problematic imports. A future contributor adding import { foo } from "../../src/bar.ts" by accident would hit this boundary test immediately, not at deno publish time or — worse — when a downstream consumer actually tried to import the package.

The measurement: 0.8 ms vs 233.7 ms

With the package working standalone, it was time to prove the speedup claim was real. I wrote a dogfood pipeline in RDK — a pure in-memory transform that filters, sorts, and limits a hardcoded array of rows:

export default {
  site: "rdk",
  name: "demo-transform",
  requires: ["tap/filter", "tap/sort", "tap/limit"],
  args: {
    rows:      { type: "array",  required: true },
    min_score: { type: "number", default: 0 },
    limit:     { type: "number", default: 5 },
  },

  async tap(handle, args) {
    return handle.pipe({
      steps: [
        { id: "filtered", run: ["tap", "filter"], args: { rows: "$args.rows", field: "score", gt: "$args.min_score" } },
        { id: "ranked",   run: ["tap", "sort"],   args: { rows: "$filtered.rows", field: "score", order: "desc" } },
        { id: "top",      run: ["tap", "limit"],  args: { rows: "$ranked.rows", n: "$args.limit" } },
      ],
      return: "$top.rows",
    });
  },
};

Then I wrote a test that runs the pipe both ways and measures wall-clock:

Deno.test("[principle/why] in-process pipe execution is faster than subprocess", async () => {
  // In-process via the new package
  const t0 = performance.now();
  const inProcResult = await runPipeInProcess(pipePath, { rows: DEMO_ROWS, limit: 3 }, tapDirs);
  const inProcMs = performance.now() - t0;

  // Subprocess via the existing tap CLI
  const s0 = performance.now();
  await runCompiledPipe("demo-transform", { rows: JSON.stringify(DEMO_ROWS), limit: 3 });
  const subMs = performance.now() - s0;

  console.log(`  in-process:  ${inProcMs.toFixed(1)}ms`);
  console.log(`  subprocess:  ${subMs.toFixed(1)}ms`);
  console.log(`  speedup:     ${(subMs / inProcMs).toFixed(1)}×`);
});

Test output:

  in-process:  0.8ms
  subprocess:  233.7ms
  speedup:     293.9×

294× faster. Not 5×. Not 20×. Nearly three hundred times, on identical input, producing identical output. The 294× factor breaks down as roughly:

The in-process path skips the first four and lands directly at step 5.

What this actually unlocks

For RDK specifically, a single 200ms saving per invocation isn't life-changing. For the broader architecture, it is. A few things become possible that weren't before:

1. High-frequency pipelines become practical

A product that wants to run a compiled pipeline every few seconds — say, a live dashboard that recomputes demand scores as new posts arrive — can't afford to spend 200ms on subprocess overhead each tick. In-process execution removes that ceiling. The dashboard becomes a tight loop.

2. Serverless / FaaS deployments work

A typical serverless function has a cold-start budget measured in hundreds of milliseconds. Spawning a subprocess inside a function execution is often fatal — you exceed the CPU quota before your own code runs. Embedding the executor as a library means the function is just import + runPipe, and cold starts stay under the serverless platform's limits.

3. The package becomes a distribution channel

Today's tap is a binary you download and install. That means every product wanting to use Tap's composition layer needs to handle binary installation, PATH management, version compatibility, and cross-platform packaging. Those are real friction points for adoption.

The package solves this. A product that embeds @taprun/executor gets the composition layer as a normal dependency. No install step. No PATH concerns. Cross-platform out of the box. And — importantly — the product controls when it updates. If Tap ships a breaking change tomorrow, an embedded product pins to a known-good version and upgrades on its own schedule.

4. The architectural split becomes the product split

The extraction implicitly draws a line between two things Tap used to conflate:

This is the same split that makes Red Hat work as a business: the kernel is public, the tooling and support and certification are proprietary. A product company's commercial value doesn't live in the runtime — it lives in everything built around the runtime to make it reliable in production.

Before today, Tap was pitching itself as "an interface compiler for AI agents." That's still true, but it undersells what the extracted package enables. A more accurate pitch now: Tap is the compilation layer for AI product workflows, and the engine that runs compiled workflows is a library any product can embed.

Publishing

The package is live on JSR as @taprun/executor@0.1.0 (published 2026-04-11). Before it shipped, deno publish --dry-run inside packages/executor/ surfaced exactly two kinds of checks that had to be fixed:

Slow types. JSR enforces that every public symbol has an explicit type annotation. Without annotations, downstream consumers would have to run type inference across the whole package at typecheck time, which kills their IDE performance. Two of the package's public constants lacked annotations:

// Before (implicit string type — JSR rejects)
export const SEMANTIC_ROLE_SNIPPET = `...`;

// After (explicit type — JSR accepts)
export const SEMANTIC_ROLE_SNIPPET: string = `...`;

Same fix for SANDBOX_ALLOWED_METHODS: Set<string>. Both were one-character changes that didn't affect any consumer, because the inferred types and the declared types were identical.

Workspace declaration. Deno treats deno.json as a workspace root, and a sub-package's deno.json must be declared as a workspace member. One line in tap-core/deno.json:

"workspace": ["./src", "./packages/executor"]

After both fixes, deno publish --dry-run reports:

Simulating publish of @taprun/executor@0.1.0 with files:
  LICENSE (33.71KB)
  packages/executor/README.md (4.08KB)
  packages/executor/deno.json (203B)
  packages/executor/mod.ts (2.63KB)
  packages/executor/src/executor.ts (23.54KB)
  packages/executor/src/page.ts (7.22KB)
  packages/executor/src/pipe.ts (11.51KB)
  packages/executor/src/sandbox.ts (7.92KB)
  packages/executor/test/smoke_test.ts (7.96KB)
Success: Dry run complete

Nine files, roughly 65 KB uncompressed. After creating the @taprun scope at jsr.io/new and running the real deno publish, the package went live in under a minute. Any Deno project can now:

import { runPipe } from "jsr:@taprun/executor";

From Node/npm, thanks to JSR's npm compat shim:

npx jsr add @taprun/executor
// Then:
import { runPipe } from "@taprun/executor";

RDK's src/compile.ts — the in-process dogfood from earlier in this post — swapped its ugly local file-URL import (new URL("../../tap-core/packages/executor/mod.ts", import.meta.url).href) for a clean "jsr:@taprun/executor@^0.1.0" string the moment the package went live. The JSR fetch added ~3 seconds of first-run cold-start time (downloading five module files into Deno's cache); every subsequent run was cache-warm and cost zero network.

The pattern I hope you steal

If you're building a product whose core value is a specific algorithm or runtime engine, and you've been treating that engine as an inseparable part of the CLI or server that ships it, consider whether the engine could live as a separate package. A few heuristics for when it's worth doing:

  1. The engine is pure data flow. No OS-specific calls, no network, no persistent state. If it would run inside a Deno Worker with zero permissions, it's a candidate.
  2. Every embedding consumer currently shells out to your CLI. If you have three products that do exec("your-tool some-command"), they're all paying fork overhead on every call. Extract the engine and let them skip the fork.
  3. The engine's value compounds with every new consumer. If adding one more embedding product doesn't cost you anything but nets that product a meaningful speedup, the package is a force multiplier.
  4. The commercial value isn't in the engine itself. You're selling forge pipelines, doctor diagnostics, runtime reliability, license validation, SaaS features. The engine is the enabling technology, not the differentiator. Open-sourcing it grows the market for your actual product.

All four are true for Tap. They're true for a surprising number of dev-tooling products. The ones who see it first win the embedding market for their category.

The four files that became @taprun/executor spent three months as a compiled binary that every embedding product had to shell out to. They're now a 65 KB package that any Deno product can import in one line. The speedup on the first measured benchmark is 294×. The architecture split it implies — engine public, product proprietary — is clearer than anything I'd written about Tap's positioning before today.

One session, four git commits, one measured number. That's what architectural clarity looks like when you stop fighting the latent shape of your codebase and just let it be the package it was always going to be.


Taprun: your agent runs the browser task — you keep the audit trail

Tell your agent a browser task on any site that needs your login — it runs in your real, already-logged-in Chrome and compiles it once into a deterministic, auditable .plan.json program: a versioned, reviewable record of exactly what it did. Every replay after is local, zero tokens, same result every time. Cookies and sessions never leave your machine — by architecture, not policy. Cloud browser SDKs can't match this; they need your session in their database to function. tap verify catches substrate drift before your data goes stale. Works with Claude Code, Cursor, Cline, Windsurf, and any MCP host. 70+ community taps.

curl -fsSL https://taprun.dev/install.sh | sh

taprun.dev · GitHub · More posts

Follow new engineering notes: RSS · Watch on GitHub