March 3, 2026

Lowe’s Scraping at Scale: Why It Works at 100 URLs and Fails in Production

min read

Copied!

Tom Shaked

No items found.

March 3, 2026

Lowe’s Scraping at Scale: Why It Works at 100 URLs and Fails in Production

min read

Copied!

Tom Shaked

No items found.

Table of Contents

Connect with Nimble

Connect on Slack

Lowe’s Scraping at Scale: Why It Works at 100 URLs and Fails in Production

A Lowe’s scraper that works in a local test often fails in production for reasons that feel mysterious:

same URL returns different states across requests
intermittent “Access Denied”
prices go missing only under load
your logs show 200 OK but your dataset is full of nulls

The cause is usually not “a bug.” It’s that scaling introduces new failure modes:

concurrency changes the shape of traffic
sessions fragment under load
store context leaks or resets
defensive systems react to uniform request patterns

This post focuses on operational design, not “try a different selector.”

The scaling mistake: treating Lowe’s like stateless HTTP

Many pipelines scale by doing something like:

throw 50–500 concurrent requests at PDPs
rotate IPs aggressively
retry on failure

On Lowe’s, that can create more inconsistency because:

you’re constantly creating fresh sessions that never stabilize store context
you trigger defensive behavior with bursty, uniform traffic
you get “soft failures” (placeholders) that look like valid responses

The fix: make scale boring with batch execution

Instead of managing thousands of independent requests yourself, design your pipeline around batch jobs that:

control concurrency intentionally
keep per-store execution stable
separate “render” tasks from “fetch” tasks
produce deterministic outputs per URL

DIY: Batch runner with controlled concurrency (Python asyncio)

This is a simple but effective pattern if you’re building your own orchestrator.

import asyncio
import aiohttp
import random

URLS = [
    "https://www.lowes.com/pd/DEWALT-20V-MAX-XR-Cordless-Drill-2-Tool-Combo-Kit/1000552693",
    # ... thousands more
]

CONCURRENCY = 10

async def fetch(session, url):
    # jitter reduces “perfectly periodic” traffic patterns
    await asyncio.sleep(random.uniform(0.05, 0.35))
    async with session.get(url, timeout=30) as resp:
        text = await resp.text()
        return url, resp.status, text

async def run():
    connector = aiohttp.TCPConnector(limit=CONCURRENCY)
    async with aiohttp.ClientSession(connector=connector, headers={
        "User-Agent": "Mozilla/5.0 ...",
        "Accept-Language": "en-US,en;q=0.9",
    }) as session:
        sem = asyncio.Semaphore(CONCURRENCY)

        async def bound_fetch(url):
            async with sem:
                return await fetch(session, url)

        tasks = [bound_fetch(u) for u in URLS]
        for coro in asyncio.as_completed(tasks):
            url, status, html = await coro
            # write html to disk / parse / etc.
            print(status, url, len(html))

asyncio.run(run())

What this solves:

prevents accidental request bursts
adds jitter so traffic isn’t perfectly uniform
creates consistent load you can tune

What it doesn’t solve by itself:

store context stability (blog #1)
dynamic JSON extraction (blog #2)
retries and completeness guarantees

Design pattern: per-store batches (avoid poisoning your dataset)

If store context matters (it does), don’t mix stores inside one big job.

Instead:

group URLs by store (or ZIP)
initialize store session once
scrape PDPs for that store in a controlled batch

DIY: Per-store task queues (conceptual)

store_to_urls = {
  "2333": [...pdp_urls...],
  "1742": [...pdp_urls...],
}

# For each store:
# 1) init store cookies (render flow if needed)
# 2) fetch URLs with stable cookie jar
# 3) parse JSON + validate

This is the difference between:

“my dataset is noisy”
and
“my dataset reflects the market”

Make retries smarter: replay failures, not everything

At scale, failures are inevitable. What matters is how you recover.

DIY best practice:

log each URL result with status + parser validation
replay only the ones that failed validation (missing price, missing availability, access denied)
keep replay bounded (don’t hammer)

DIY: Validation-driven replay list

import asyncio
import aiohttp
import random

URLS = [
    "https://www.lowes.com/pd/DEWALT-20V-MAX-XR-Cordless-Drill-2-Tool-Combo-Kit/1000552693",
    # ... thousands more
]

CONCURRENCY = 10

async def fetch(session, url):
    # jitter reduces “perfectly periodic” traffic patterns
    await asyncio.sleep(random.uniform(0.05, 0.35))
    async with session.get(url, timeout=30) as resp:
        text = await resp.text()
        return url, resp.status, text

async def run():
    connector = aiohttp.TCPConnector(limit=CONCURRENCY)
    async with aiohttp.ClientSession(connector=connector, headers={
        "User-Agent": "Mozilla/5.0 ...",
        "Accept-Language": "en-US,en;q=0.9",
    }) as session:
        sem = asyncio.Semaphore(CONCURRENCY)

        async def bound_fetch(url):
            async with sem:
                return await fetch(session, url)

        tasks = [bound_fetch(u) for u in URLS]
        for coro in asyncio.as_completed(tasks):
            url, status, html = await coro
            # write html to disk / parse / etc.
            print(status, url, len(html))

asyncio.run(run())

A More Scalable Way to Run Lowe’s Scraping Jobs

At large volumes, Lowe’s scraping usually fails due to orchestration, not parsing. This is where teams often move to Nimble’s Web API, which treats batch execution as a first-class primitive rather than something you have to build and tune yourself.

Instead of managing thousands of independent requests, Nimble lets you define a Lowe’s scraping job once and executes it with controlled concurrency, stable store context, and built-in recovery.

Example: Defining a Lowe’s batch job with Nimble

import time, json
from nimble_python import Nimble

URLS = ["{PDP_URL_1}", "{PDP_URL_2}"][: int("{N}")]
ZIP = "{STORE_ID or ZIP}"
BATCH_SIZE = 5

nimble = Nimble(api_key="YOUR_NIMBLE_API_KEY")

# 1) Set store context once (ZIP → cookies)
store = nimble.extract(
    url="https://www.lowes.com",
    country="US",
    render=True,
    parse=False,
    browser_actions=[
        {"fill": {"selector": "input[type='tel']", "value": ZIP, "required": False}},
        {"get_cookies": True},
    ],
)

cookies = "; ".join(
    f"{c['name']}={c['value']}"
    for c in (store.data.cookies or [])
    if "lowes" in (c.get("domain") or "").lower()
)

results = []

# 2) Process URLs in small batches (no burst)
for i in range(0, len(URLS), BATCH_SIZE):
    batch = URLS[i:i+BATCH_SIZE]
    task_ids = []

    for url in batch:
        task = nimble.extract.async(
            url=url,
            country="US",
            render=True,  # needed for network capture
            parse=False,
            cookies=cookies,
            network_capture=[{
                "method": "GET",
                "resource_type": ["xhr", "fetch"],
                "wait_for_requests_count": 2,
                "wait_for_requests_count_timeout": 10,
            }],
        )
        task_ids.append(task.task_id)

    # wait for batch to finish
    done = False
    while not done:
        states = [nimble.tasks.status(t).task.state for t in task_ids]
        done = all(s in ("success", "failed") for s in states)
        if not done:
            time.sleep(2)

    # collect + normalize
    for url, tid in zip(batch, task_ids):
        r = nimble.tasks.results(tid)
        data = r.model_dump() if hasattr(r, "model_dump") else r

        # first captured JSON response
        payload = (
            data.get("data", {})
                .get("network_capture", [{}])[0]
                .get("result", [{}])[0]
                .get("response", {})
                .get("body")
        )

        if isinstance(payload, str):
            try:
                payload = json.loads(payload)
            except:
                payload = {}

        results.append({
            "url": url,
            "price": payload.get("price"),
            "availability": payload.get("availability"),
            "valid": bool(payload.get("price")),
        })

print(json.dumps(results, indent=2))

Why this works better on Lowe’s

Batch execution avoids bursty traffic patterns that trigger session resets and placeholder data
Stable session and store context prevents pricing and availability from leaking across locations
Selective rendering keeps execution efficient without sacrificing correctness
Built-in network capture ensures pricing, inventory, and fulfillment are extracted from the same sources Lowe’s uses internally

The result is fewer partial responses, fewer retries, and far less custom orchestration code.

Conclusion

Many Lowe’s scraping pipelines fail not because of parsing errors, but because scaling introduces new failure modes. Increased concurrency, fragmented sessions, unstable store context, and silent partial responses all emerge only at volume.

This post explained why batch execution is essential on Lowe’s, how uncontrolled concurrency degrades data quality, and what architectural changes teams make to keep large-scale Lowe’s scraping reliable.