Block the Noise: How We Gave You More Control Over What Your Scraper Sees
.png)
If you've ever scraped a modern webpage, you’ve probably run into this:
You send a simple request for a product page. Instead of just getting a clean block of HTML with the title, price, and description, you’re hit with tracking scripts, A/B testing beacons, video autoplay widgets, and a payload bloated with third-party junk you didn’t ask for—and definitely don’t want to parse.
It’s like asking for the ingredients on a box of cereal and getting handed the entire supermarket.
So we built a fix: blocked_domains in Nimble Web API.
The Problem: Pages That Load Everything, All the Time
At Nimble, we obsess over performance. Not just speed, but data relevance, request efficiency, and cost per scrape. And across thousands of real-world use cases, we kept seeing the same pattern:
You ask for one thing. The page gives you everything.
Even if your selectors are clean and your use case is simple, your scraper still ends up downloading:
- Massive image assets from third-party CDNs
- Video players and autoplay widgets
- Ad network scripts, retargeting beacons, analytics payloads
- Chat popups, overlays, and cookie banners
All of these come from external domains that have nothing to do with the data you’re trying to collect.
And the consequences stack up:
- Longer render times
- Bigger payloads
- More memory and CPU usage per request
- Higher bandwidth consumption
- More retry logic and request failures
At scale, that’s real money. Hundreds of megabytes burned. Proxy costs spiking. Render queues backing up. Parsing pipelines overloaded with junk.
The worst part? You didn’t want any of it.
So we built a fix. And now it’s live.
What We Built: blocked_domains
We introduced a new setting under that lets you specify which domains you do not want loading during rendering. Think of it as putting blinders on the browser—load only what matters.
Here’s what that looks like in practice:
"render_options": {
"blocked_domains": [
"doubleclick.net",
"googletagmanager.com"
]
}
You add this to your Web API request, and Nimble’s rendering engine will skip any network requests from those domains. The result? A page that loads faster, with less garbage, and fewer layout shifts.
Real Use: Scraping Yelp Without the Bloat
Take Yelp as a real-world example.
Each page pulls in dozens of external scripts. A single request can bring in video players, interactive ads, and image-heavy carousels. But if you’re just trying to extract business data and metadata, none of that is useful.
Using blocked_domains
, we’ll cut out Yelp’s CDN to avoid loading dozens of unnecessary assets including scripts, images, and stylesheets, while still getting clean, structured HTML like this:
payload = {
"url": "https://www.yelp.com/biz/golden-boy-pizza-san-francisco-5?osq=Pizza",
"render": "true",
"parse": "true",
"render_options": {
"blocked_domains": [
"fl.yelpcd.com"
]
}
}
Without using blocked_domains
, we recorded a single page payload size of 1.07MB.
After implementing blocked_domains, we reduced that size to 438 KB - over 50% less bloat and bandwidth!
.png)
Why We Think This Matters
Scraping today isn’t just about can I get the data—it’s about how clean, consistent, and scalable that data is.
By helping you control what gets rendered, blocked_domains
makes your scrapes:
- More deterministic
- Easier to parse
- More cost-efficient (especially at scale)
- Easier to debug when something breaks
We don’t want you writing workarounds. We want you shipping reliable pipelines.
A Few Best Practices
Here’s what we recommend when using this feature:
- Start small. Block known ad/analytics domains first, then expand based on what you see in your HTML.
- Avoid blocking critical CDNs (e.g., if a site loads all its CSS from a first-party domain, blocking that will break everything).
- Pair it with
render_type: idle0
for fast exits once the page has “settled” in the DOM. - Use it with parsing to cleanly extract what you need and nothing more.
This feature pairs beautifully with our render_flow
, parse
, and merge_dynamic
capabilities for full control over the rendering process.
Try It Today
If you're already using Nimble Web API, you can add blocked_domains
to your existing requests immediately. If not, check the docs and give it a shot.
It’s a small flag—but it makes a big difference when you’re scraping at scale.
FAQ
Answers to frequently asked questions