Playwright has 400+ API methods. Puppeteer has 300+. Selenium has its own taxonomy of WebDriver, WebElement, Actions, Options. Every browser automation framework invents its own surface area, and every program written against one framework is locked to that framework forever.
Tap takes a different approach. 8 core operations. That's the entire interface between a tap program and any runtime. A Chrome extension implements them. A Playwright runtime implements them. A macOS desktop runtime implements them. The programs don't know or care which one is running underneath.
Write a Playwright script today. Tomorrow you need it to run inside a Chrome extension (because the site needs real login cookies). Rewrite. Next month you need to automate a native macOS app. Rewrite again.
Each framework carries implicit assumptions:
Your automation logic — "click this button, read that table, type into that field" — is identical across all of them. The code is completely different. The abstraction layer is missing.
Every interface interaction — browser, desktop, mobile — reduces to eight irreducible operations:
eval Execute code in the target context, return result pointer Move / click / drag at coordinates keyboard Type text or press key combinations nav Navigate to a URL wait Wait for a condition (time, selector, network idle) screenshot Capture the current visual state run Execute another tap program capabilities Report what this runtime supports
That's it. Not 400 methods. Not a class hierarchy. Eight functions that every runtime implements, and every tap program calls.
The insight is irreducibility. We didn't start with "what features should we support?" and enumerate. We started with "what is the minimum set of operations from which every other operation can be composed?" and reduced.
If 8 operations are all you have, how do you click a button by its text? How do you fill a form? How do you upload a file?
You compose:
// click("Submit") is not a core operation. // It's composed from two core operations: click(target) = eval(find target → get coordinates) + pointer(x, y, 'click') // type("hello") into a field: fill(selector, value) = eval(find element → focus it) + keyboard(value) // upload a file: upload(selector, path) = eval(find file input) + runtime-specific file injection
Tap ships 17 built-in operations composed from core: click, type, fill, hover, scroll, pressKey, select, upload, dialog, fetch, find, cookies, download, waitFor, waitForNetwork, ssrState, storage. Every one of them works on every runtime because every one of them is built from the same 8 primitives.
A runtime can override a built-in for performance — Chrome's extension runtime uses CDP for file uploads because it's faster than the composed path — but it doesn't have to. The default composition works everywhere.
A .tap.js program calls built-in operations. It never calls Playwright methods, Chrome APIs, or AppleScript commands. This makes every tap program runtime-agnostic by construction:
// This tap works on Chrome, Playwright, and macOS
// without a single line changed
export default {
site: "github",
name: "trending",
async tap(handle) {
await handle.nav("https://github.com/trending");
const repos = await handle.eval(() => {
return [...document.querySelectorAll("article.Box-row")]
.map(el => ({
name: el.querySelector("h2 a")?.textContent?.trim(),
stars: el.querySelector("span.d-inline-block")?.textContent?.trim(),
}));
});
return { rows: repos };
}
}
Switch runtimes with a flag:
$ tap github trending # Chrome extension (default) $ tap github trending --runtime playwright # Headless, CI-friendly $ tap github trending --runtime chrome # User's real browser, logged in
Same program. Same output. Different runtime underneath. The program doesn't know and doesn't need to know.
Playwright is excellent. It's the best browser automation library ever built. But it's a library, not a protocol. The difference matters:
| Library (Playwright) | Protocol (Tap) | |
|---|---|---|
| Browser you control | Yes | Yes |
| User's real browser (cookies, sessions) | No | Yes (Chrome runtime) |
| Native desktop apps | No | Yes (macOS runtime) |
| Mobile apps (future) | No | Yes (protocol is extensible) |
| CI / headless | Yes | Yes (Playwright runtime) |
| Program portability | Locked to Playwright | Any runtime |
A protocol abstracts the runtime. A library is the runtime. When you write against a library, you get that library's capabilities and constraints. When you write against a protocol, you get every present and future runtime that implements it.
Before TCP/IP, every network vendor had its own protocol. IBM had SNA. DEC had DECnet. Novell had IPX/SPX. Applications written for one couldn't talk to another. TCP/IP defined a minimal, composable interface — and within a decade, every other protocol was either dead or tunneled over TCP/IP.
Browser automation in 2026 is pre-TCP/IP networking. Every vendor has its own interface. Programs are locked to vendors. The abstraction layer that should exist — a minimal protocol for interface operations — doesn't.
Tap's 8 core operations are that layer. Not because 8 is a magic number, but because 8 is what you get when you reduce instead of enumerate. You don't need 400 methods. You need the minimum set from which 400 methods can be composed.
Implementing a Tap runtime is implementing 8 functions:
// A runtime is this interface. Nothing more. interface TapRuntime { eval(code: string): Promise<any> pointer(x: number, y: number, action: string): Promise<void> keyboard(text: string, opts?: KeyOpts): Promise<void> nav(url: string): Promise<void> wait(condition: WaitCondition): Promise<void> screenshot(opts?: ScreenshotOpts): Promise<Buffer> run(site: string, name: string, args?: object): Promise<Result> capabilities(): Promise<Capabilities> }
Implement these 8 methods and you get all 17 built-in operations for free. click, fill, upload, scroll — they all compose from your 8 implementations without any additional code.
Tap's Chrome extension runtime is ~600 lines. The Playwright runtime is ~400 lines. The macOS runtime (JXA + CGEvent + Accessibility API) is ~500 lines. That's how small a runtime is when the protocol is right.
Not every runtime supports everything. The macOS runtime can't do cookies (there's no browser). A headless Playwright runtime can't access a user's logged-in session. A mobile runtime might not support keyboard the same way.
This is what capabilities() is for:
// Chrome extension runtime capabilities() { return { eval: true, pointer: true, keyboard: true, cookies: true, // can read/write browser cookies upload: true, // CDP file injection sessions: true, // user's real login sessions } } // Playwright runtime capabilities() { return { eval: true, pointer: true, keyboard: true, cookies: true, upload: true, sessions: false, // no user sessions, fresh context headless: true, // can run without display } }
A tap can declare requires: ["sessions"] and Tap will route it to a runtime that has sessions. A CI pipeline can request --runtime playwright knowing it supports headless. The protocol handles the negotiation; the program stays clean.
The deepest design principle behind the interface protocol: the tap API is the protocol, not the implementation.
The Chrome extension is not Tap. Playwright is not Tap. The macOS runtime is not Tap. They are implementations of the protocol. The protocol is the 8 operations, the 17 built-in compositions, and the capability negotiation contract.
This distinction is why Tap programs survive runtime changes. When Chrome ships a breaking API change, the Chrome runtime updates — zero tap programs change. When a new platform appears (Android, iOS, Electron), a new runtime implements 8 methods — and every existing tap program runs on it immediately.
This is what "programs beat prompts" looks like at the architecture level. The program is written against a protocol, not a vendor. The protocol is minimal enough to implement anywhere. The implementation is swappable. The program is permanent.