← Taprun · Blog

Your Scrapers Break Every Week. Here's Why tap doctor Exists.

April 18, 2026 · Leon Ting · 6 min read

Somewhere in your infrastructure, 10% of your scrapers stopped working this week. You'll find out on Thursday when the dashboard looks funny. On Friday you'll figure out which one. On Monday you'll fix it. The week after, a different 10% will break.

This is not a worst-case. It's the modal experience for anyone operating more than a handful of scrapers in production:

"About 10–15% of my scrapers break EVERY WEEK due to website changes."

— r/webscraping, 376 upvotes

"Every time a website redesigns or updates their layout, I'm manually fixing selectors and rewriting parts of the workflow. It's eating up hours every month."

— ByteForge, Latenode community

"My selectors got wiped twice in one month. Headless Puppeteer handles around ten portals fine. Push it to fifty and localStorage breaks, IP bans hit, random modals destroy everything."

— r/webscraping, scaling 100+ vendor dashboards

The common shape: sites change on their own clock, your selectors don't, and the gap between "broken" and "you noticed" is measured in days.

Why Silent Failure Is the Worst Case

Scrapers rarely crash. They return empty arrays. Downstream systems happily consume empty data. Dashboards render. Reports ship. Nobody pages:

"Instead of throwing an error when a page structure changes, they return empty arrays... A scraper that fails silently poisons your data for days or weeks before anyone notices."

— BinaryBits

A crash would be kind. The scraper telling you nothing is the expensive part.

What a Health Contract Looks Like

A tap is a small JavaScript program. Every tap declares what "healthy" output means:

// built into every tap
health: {
  min_rows: 5,                    // fewer than 5 rows → fail
  non_empty: ["title"],            // title must never be blank
  pattern: { price: /^\$\d+(\.\d{2})?$/ }  // prices must look like prices
}

Now the scraper can't quietly lie. Either the contract is satisfied or the run is flagged.

tap doctor — Check Everything, Get the Diff

$ tap doctor
hackernews/hot    ✔ ok     30 rows  (245ms)
amazon/product    ✘ fail   0 rows   min_rows: expected ≥5, got 0
github/trending   ✔ ok     25 rows  (1.2s)
shopify/checkout  ✘ fail   3 rows   pattern: price "TBD" does not match /^\$\d+/
reddit/hot        ✔ ok     25 rows  (890ms)

Two failures surfaced. One is a site structure change (0 rows). One is a product-data change (price format mismatch). Neither would crash. Neither would fill Sentry. tap doctor catches both because the contract catches both.

Structural Diff: What Actually Changed

Knowing a tap is broken is step one. Step two is fixing it — which requires knowing what moved. Taprun stores a structural fingerprint (DOM shape + captured API endpoints) from the last healthy run. When a run fails the contract, tap doctor --diff emits the delta:

$ tap doctor amazon product --diff
- selector .product-title   → 30 matches  (expected)
+ selector .product-title   → 0 matches   (broken)
+ selector .a-color-price   → 0 matches
+ selector [data-testid=price]  → 30 matches  (new)
→ site renamed class "a-color-price" to data-testid="price"

That diff is the fix spec. Paste it into your agent and ask for a patch. Pro tier does it automatically on a cron.

The Real Cost You're Replacing

"The single biggest cost in web scraping is not servers or proxies. It is developer time. If you manage 500 scrapers, you are essentially a full-time firefighter."

— Self-Repairing Scrapers, r/WebDataDiggers

That firefighter role is what tap doctor eliminates. Not by preventing breakage — sites change, that's fine — but by making breakage immediately visible and diagnosable. You replace "manual spot checks" with "a 10-second report."

Run It On What You Have Now

# Install
curl -fsSL https://taprun.dev/install.sh | sh

# Check health of everything you've forged
tap doctor

# Schedule a daily check at 6am
tap doctor --schedule "0 6 * * *"

# Emit the structural diff for a broken tap
tap doctor amazon product --diff

# Auto-heal (Pro) — AI reads the diff and patches
tap doctor --auto

Related


Try it now

# Zero-install via npx
npx -y @taprun/cli doctor

# Or install permanently
curl -fsSL https://taprun.dev/install.sh | sh
tap doctor

Homepage · GitHub · 140+ pre-forged skills with health contracts built in