The 2026 Nodriver guide for production scraping

Most scraping libraries that were “undetectable” in 2023 are now fingerprinted at the TCP handshake level before your first request lands. Playwright gets caught on TLS fingerprints. Selenium leaves DOM artifacts even when you patch them. Puppeteer-extra with stealth plugin still leaks via timing signatures on high-value targets like LinkedIn, Zillow, or any site running Cloudflare’s bot score tier.

Nodriver, built by ultrafunkamsterdam and maintained on GitHub, takes a different approach. Instead of patching Chrome, it drives a real, unmodified Chrome binary over the Chrome DevTools Protocol directly, bypassing the WebDriver interface entirely. No navigator.webdriver, no CDP command artifacts in the page’s JS context, no chromedriver process to fingerprint. This matters in 2026 because bot detection has moved well past user-agent checks.

This guide is for operators already running scrapers in production, or developers who need to graduate from basic HTTP scrapers to browser automation for JavaScript-heavy targets. By the end you will have a working Nodriver setup with proxy rotation, session management, and a deployment pattern that scales from a single VPS to a distributed fleet.

what you need

Python 3.11+ , Nodriver’s async API relies on asyncio features stabilised in 3.11
Chrome or Chromium 120+ , Nodriver downloads a matching Chromium build via nodriver itself on first run, or you point it at an existing binary
A Linux VPS or bare metal box , Ubuntu 22.04 LTS works well; avoid Windows for production, the process management is messier
Proxies , rotating residential or ISP proxies. Datacenter proxies will get flagged on most Cloudflare-protected targets. Budget roughly $3-8/GB for residential; ProxyScraping has ISP proxies worth testing for mid-tier targets
nodriver Python package , currently 0.36 on PyPI as of May 2026. Install via pip
Virtual display , xvfb or Xvfb for headless operation on servers without a display server
RAM , Chrome is heavy. Budget 300-500 MB per concurrent browser instance. A box running 20 parallel sessions needs at least 12 GB RAM

step by step

1. install and verify nodriver

pip install nodriver
# verify
python -c "import nodriver; print(nodriver.__version__)"

Nodriver pulls a Chromium build on first run if you do not specify a browser executable. On a fresh VPS this download is about 200 MB. If you want to pin a specific Chrome binary:

import nodriver as uc

browser = await uc.start(browser_executable_path="/usr/bin/google-chrome-stable")

if it breaks: If Chrome fails to launch with a sandbox error on Linux, add --no-sandbox to the browser args, or run as a non-root user (preferred). Never run Chrome as root in production without --no-sandbox, and if you must, understand the security implications.

2. set up a virtual display for headless servers

Chrome in nodriver is not technically “headless” in the traditional --headless sense, it runs a real display. On a server you need Xvfb.

sudo apt-get install -y xvfb
Xvfb :99 -screen 0 1920x1080x24 &
export DISPLAY=:99

For a production setup, wrap this in a systemd unit so the display comes up at boot:

[Unit]
Description=Virtual framebuffer for Chrome scraping

[Service]
ExecStart=/usr/bin/Xvfb :99 -screen 0 1920x1080x24
Restart=always

[Install]
WantedBy=multi-user.target

if it breaks: If you see cannot connect to X server, check $DISPLAY is set and Xvfb is running. ps aux | grep Xvfb confirms it.

3. write your first nodriver script

import asyncio
import nodriver as uc

async def scrape(url: str) -> str:
    browser = await uc.start()
    page = await browser.get(url)
    await page.sleep(2)  # let JS render
    content = await page.get_content()
    browser.stop()
    return content

if __name__ == "__main__":
    html = asyncio.run(scrape("https://example.com"))
    print(html[:500])

Run it. You should see HTML output. The key thing nodriver does here is launch Chrome with a real user profile context and no WebDriver flags set in the JS environment.

if it breaks: ModuleNotFoundError means your pip install is in the wrong virtualenv. TimeoutError on the get() call usually means the page is actively blocking or your network is slow. Add timeout=30 to the get() call.

4. add proxy rotation

Nodriver passes Chrome args at launch, so proxy configuration goes there. The cleanest pattern for rotation is to restart a browser instance per domain or per N requests.

import asyncio
import nodriver as uc
import random

PROXIES = [
    "http://user:[email protected]:8080",
    "http://user:[email protected]:8080",
]

async def scrape_with_proxy(url: str) -> str:
    proxy = random.choice(PROXIES)
    browser = await uc.start(
        browser_args=[f"--proxy-server={proxy}"]
    )
    page = await browser.get(url)
    await page.sleep(2)
    content = await page.get_content()
    browser.stop()
    return content

For authenticated proxies with a user:pass format, Chrome handles the auth challenge automatically when credentials are embedded in the proxy URL.

if it breaks: If pages load but show “your IP is blocked”, the proxy is being detected. Switch from datacenter to residential proxies. If Chrome throws a proxy auth dialog, it means your proxy URL format is wrong , verify the scheme and credential format against your proxy provider’s docs.

5. handle anti-bot challenges

Nodriver sidesteps most passive fingerprinting checks because you are running a real Chrome binary. But active challenges like Cloudflare Turnstile or hCaptcha still require either:

waiting for the challenge to auto-solve (Cloudflare often does this for clean residential IPs)
using a CAPTCHA solving service like 2captcha or CapMonster

For Cloudflare, the pattern is to wait and poll:

async def wait_for_cf(page, timeout=30):
    for _ in range(timeout):
        title = await page.evaluate("document.title")
        if "Just a moment" not in title:
            return True
        await page.sleep(1)
    return False

page = await browser.get("https://target.com")
solved = await wait_for_cf(page)
if not solved:
    raise RuntimeError("Cloudflare challenge not resolved")

For Turnstile or hCaptcha, the 2captcha API accepts the challenge parameters and returns a token you inject. See 2captcha’s documentation for the current integration spec , the API has been stable since 2022.

if it breaks: If the challenge never resolves with a residential proxy, try a different IP. Cloudflare’s bot score is partly IP reputation-based.

Nodriver exposes CSS selectors and JavaScript evaluation for extraction:

# find an element
element = await page.find("h1")
text = await element.get_text()

# or run JS directly
price = await page.evaluate(
    "document.querySelector('.price').innerText"
)

# click and wait for navigation
button = await page.find("button.load-more")
await button.click()
await page.sleep(1.5)

For pages with dynamic pagination or infinite scroll, combine click() with sleep() and re-query the DOM. There is no built-in waitForNavigation equivalent in nodriver, so sleeping is the practical approach for most production use cases.

if it breaks: ElementNotFoundError means the selector did not match. Use the browser’s DevTools locally to verify your selectors before running headless.

7. manage sessions and cookies

For sites that require login, persist the Chrome profile directory between sessions:

browser = await uc.start(
    user_data_dir="/var/scrapers/profiles/account_001"
)

Chrome writes cookies, local storage, and cached credentials to this directory. On the next launch, the session resumes. This is more reliable than manually injecting cookies because Chrome’s session storage is opaque and varies by site.

if it breaks: If the profile directory gets corrupted (this happens on hard kills), delete the SingletonLock file inside the profile directory and retry. If the session is expired, you need to re-authenticate.

8. wrap everything in a job queue

For anything beyond 5-10 pages, you need a queue. The simplest production pattern is a Redis list with a worker pool:

import asyncio
import nodriver as uc
import redis.asyncio as aioredis

async def worker(queue: aioredis.Redis, proxy: str):
    while True:
        url = await queue.lpop("scrape_queue")
        if not url:
            break
        browser = await uc.start(browser_args=[f"--proxy-server={proxy}"])
        page = await browser.get(url.decode())
        await page.sleep(2)
        content = await page.get_content()
        await queue.rpush("results", content)
        browser.stop()

async def main():
    r = await aioredis.from_url("redis://localhost")
    proxies = ["http://p1:pass@host:port", "http://p2:pass@host:port"]
    tasks = [worker(r, proxy) for proxy in proxies]
    await asyncio.gather(*tasks)

This is a minimal example. In production you will want error handling, retry logic, and a results schema, but the shape is correct.

if it breaks: Redis ConnectionRefusedError means Redis is not running or you are connecting to the wrong host/port.

common pitfalls

reusing a single browser instance for too many pages. Memory leaks build up. Restart the browser every 50-100 pages or every hour, whichever comes first.

ignoring Chrome process cleanup. If your script crashes, Chrome processes keep running. Add a try/finally block that calls browser.stop(), and run a cron job that kills orphaned Chrome processes: pkill -f "google-chrome" is blunt but effective during debugging.

using datacenter proxies on Cloudflare-protected targets. Cloudflare’s risk scoring in 2026 flags entire datacenter ASNs. For mid-tier targets ISP proxies work; for high-value targets, residential only.

parsing HTML immediately after get(). JavaScript-heavy pages render asynchronously. A 1-2 second sleep after navigation is the practical default. For more precise control, evaluate a JavaScript expression that returns true when your target element exists.

running too many parallel sessions per machine. Chrome is 300-500 MB per instance. 20 sessions on an 8 GB machine will swap. Keep session count at (RAM_GB - 2) / 0.5 as a rough ceiling. For multi-account operations at scale, the patterns at multiaccountops.com/blog/ cover machine allocation strategies that apply directly here.

scaling this

10x (50-100 pages/hour): A single 8-core, 16 GB VPS handles this. Run 10-15 concurrent browser sessions with the Redis queue pattern above. A single rotating residential proxy pool with 5-10 exit IPs is sufficient. ProxyScraping’s API makes rotation straightforward.

100x (500-1000 pages/hour): You need multiple machines. Introduce a central job coordinator, shared Redis (Redis Cloud free tier covers this volume), and separate your proxy pool by target domain to avoid IP collisions. At this scale, Chrome restarts between every job to keep memory clean. Also start tracking per-IP ban rates.

1000x (5000+ pages/hour): You are now running a distributed browser fleet. Each machine runs 15-20 Chrome instances, you have 5-10 machines, and your proxy spend is $200-500/month minimum for residential. At this scale, build a proxy health checker that removes banned IPs from rotation automatically. Consider a dedicated proxy manager like Bright Data’s Proxy Manager (open source on GitHub) to handle rotation at the fleet level. Your bottleneck shifts from scraping to parsing and storage. Pipeline results into a message queue like Kafka or SQS rather than Redis.

where to go next

Proxyscraping residential proxy review and setup guide , covers proxy pool configuration for browser automation specifically
Fingerprint evasion in 2026: TLS, HTTP/2, and browser entropy , deeper dive into the detection surface nodriver reduces but does not eliminate
Running parallel browser sessions with asyncio and Redis queues , extends the queue pattern above into a production-grade worker pool

Written by Xavier Fok

disclosure: this article may contain affiliate links. if you buy through them we may earn a commission at no extra cost to you. verdicts are independent of payouts. last reviewed by Xavier Fok on 2026-05-19.

The 2026 Nodriver guide for production scraping

The 2026 Nodriver guide for production scraping

what you need

step by step

1. install and verify nodriver

2. set up a virtual display for headless servers

3. write your first nodriver script

4. add proxy rotation

5. handle anti-bot challenges

6. extract data and handle navigation

7. manage sessions and cookies

8. wrap everything in a job queue

common pitfalls

scaling this

where to go next

need infra for this today?