tap doctor Exists.Somewhere in your infrastructure, 10% of your scrapers stopped working this week. You'll find out on Thursday when the dashboard looks funny. On Friday you'll figure out which one. On Monday you'll fix it. The week after, a different 10% will break.
This is not a worst-case. It's the modal experience for anyone operating more than a handful of scrapers in production:
"About 10–15% of my scrapers break EVERY WEEK due to website changes."
— r/webscraping, 376 upvotes
"Every time a website redesigns or updates their layout, I'm manually fixing selectors and rewriting parts of the workflow. It's eating up hours every month."
— ByteForge, Latenode community
"My selectors got wiped twice in one month. Headless Puppeteer handles around ten portals fine. Push it to fifty and localStorage breaks, IP bans hit, random modals destroy everything."
— r/webscraping, scaling 100+ vendor dashboards
The common shape: sites change on their own clock, your selectors don't, and the gap between "broken" and "you noticed" is measured in days.
Scrapers rarely crash. They return empty arrays. Downstream systems happily consume empty data. Dashboards render. Reports ship. Nobody pages:
"Instead of throwing an error when a page structure changes, they return empty arrays... A scraper that fails silently poisons your data for days or weeks before anyone notices."
— BinaryBits
A crash would be kind. The scraper telling you nothing is the expensive part.
A tap is a small JavaScript program. Every tap declares what "healthy" output means:
// built into every tap health: { min_rows: 5, // fewer than 5 rows → fail non_empty: ["title"], // title must never be blank pattern: { price: /^\$\d+(\.\d{2})?$/ } // prices must look like prices }
Now the scraper can't quietly lie. Either the contract is satisfied or the run is flagged.
tap doctor — Check Everything, Get the Diff$ tap doctor hackernews/hot ✔ ok 30 rows (245ms) amazon/product ✘ fail 0 rows min_rows: expected ≥5, got 0 github/trending ✔ ok 25 rows (1.2s) shopify/checkout ✘ fail 3 rows pattern: price "TBD" does not match /^\$\d+/ reddit/hot ✔ ok 25 rows (890ms)
Two failures surfaced. One is a site structure change (0 rows). One is a product-data change (price format mismatch). Neither would crash. Neither would fill Sentry. tap doctor catches both because the contract catches both.
Knowing a tap is broken is step one. Step two is fixing it — which requires knowing what moved. Taprun stores a structural fingerprint (DOM shape + captured API endpoints) from the last healthy run. When a run fails the contract, tap doctor --diff emits the delta:
$ tap doctor amazon product --diff - selector .product-title → 30 matches (expected) + selector .product-title → 0 matches (broken) + selector .a-color-price → 0 matches + selector [data-testid=price] → 30 matches (new) → site renamed class "a-color-price" to data-testid="price"
That diff is the fix spec. Paste it into your agent and ask for a patch. Pro tier does it automatically on a cron.
"The single biggest cost in web scraping is not servers or proxies. It is developer time. If you manage 500 scrapers, you are essentially a full-time firefighter."
— Self-Repairing Scrapers, r/WebDataDiggers
That firefighter role is what tap doctor eliminates. Not by preventing breakage — sites change, that's fine — but by making breakage immediately visible and diagnosable. You replace "manual spot checks" with "a 10-second report."
# Install curl -fsSL https://taprun.dev/install.sh | sh # Check health of everything you've forged tap doctor # Schedule a daily check at 6am tap doctor --schedule "0 6 * * *" # Emit the structural diff for a broken tap tap doctor amazon product --diff # Auto-heal (Pro) — AI reads the diff and patches tap doctor --auto