The 2026 Curl Impersonate guide for production scraping
The 2026 Curl Impersonate guide for production scraping
Most scrapers die on TLS, not on IP. i’ve watched operators burn through residential proxy budgets because their requests were flagged before a single HTML byte was returned. the proxy IP was fine. the problem was the TLS Client Hello, broadcasting “i am a bot running Python requests” to every server that bothered to check.
curl-impersonate fixes that. it is a patched build of curl that mimics the exact TLS handshake and HTTP/2 settings of real browsers, Chrome and Firefox primarily. when you send a request through it, the fingerprint looks indistinguishable from a user who opened Chrome on a MacBook. combined with rotating residential proxies, this is one of the most cost-effective stacks for production scraping in 2026.
this guide is for operators already running scrapers at some scale, not people writing their first requests.get() call. by the end you will have curl-impersonate or its Python wrapper curl_cffi running, tested against a fingerprint checker, and wired into a proxy rotation setup. i will also cover where things break at volume and what to do about it.
what you need
- OS: linux preferred (Ubuntu 22.04+, Debian 12). macOS works. Windows needs WSL2, which adds friction.
- Docker: version 24+ for the prebuilt image path
- Python 3.10+: if you are using curl_cffi (recommended for most teams)
- pip: to install curl_cffi
- Residential or ISP proxies: datacenter proxies will not save you here. plan for at least a few GB of residential bandwidth for testing. pricing from major vendors runs $2-$8/GB in 2026.
- A fingerprint testing endpoint: tls.peet.ws or browserleaks.com/tls work for manual checks
- Time budget: 1-2 hours for initial setup, more if you are integrating into an existing scraping framework
no paid tools are required to follow this guide. curl-impersonate is MIT-licensed open source. curl_cffi is available on PyPI for free.
step by step
step 1: understand what you are actually fixing
before touching a terminal, get clear on the problem. TLS 1.3 introduced a more expressive Client Hello, which means servers can fingerprint the specific TLS library your HTTP client uses. the JA3 hash of a Python requests call looks nothing like Chrome’s. Cloudflare, Akamai, and most in-house bot detection systems check this before they check your IP.
curl-impersonate patches curl to use the same TLS library Chrome uses (BoringSSL) and replaces the default TLS parameters with Chrome’s exact cipher suites, extensions, and elliptic curves. it also copies Chrome’s HTTP/2 SETTINGS frame, which is another reliable fingerprint. you are not spoofing headers. you are making the underlying handshake match.
if it breaks: if you are not sure whether TLS fingerprinting is your actual problem, check your scraper’s JA3 hash at tls.peet.ws first. if it already matches Chrome, curl-impersonate will not help you and the issue is elsewhere (cookies, JS challenges, account state).
step 2: install curl_cffi in Python
for most production work i reach for curl_cffi rather than the raw curl-impersonate binary. it is a Python binding that wraps a prebuilt curl-impersonate, so you skip compiling from source. the API is close enough to the requests library that migration is fast.
pip install curl_cffi
verify it installed:
from curl_cffi import requests
resp = requests.get("https://tls.peet.ws/api/clean", impersonate="chrome124")
print(resp.json())
you should see a JSON response with TLS details. look at the ja3 field. if it matches Chrome’s known JA3, you are good.
if it breaks: on some linux setups pip will install but the import fails with a missing shared library. run pip install --upgrade curl_cffi and if the problem persists, check that libcurl is not being overridden by a system package. on Ubuntu: sudo apt remove libcurl4 can sometimes clear conflicts, but read the dependency tree first.
step 3: pick your impersonation target
curl_cffi supports several browser targets. as of mid-2026 the main options are:
chrome99, chrome100, chrome101, chrome104, chrome107, chrome110, chrome116, chrome119, chrome124
firefox91esr, firefox99, firefox102, firefox110, firefox117
safari15_3, safari15_5, safari17_0
edge99, edge101
default to chrome124 unless you have a specific reason not to. it is the most commonly seen fingerprint in real traffic, which makes it the least suspicious. if a target site specifically serves Safari users or you are scraping App Store data, safari17_0 makes sense.
if it breaks: some targets are outdated enough that fingerprint detection systems will flag them because no real user is running Chrome 99 in 2026. stay current. when curl_cffi ships new targets with library updates, upgrade.
step 4: wire in proxy rotation
a clean TLS fingerprint gets you past the handshake, but you still need IP diversity. curl_cffi accepts proxies the same way requests does:
from curl_cffi import requests
proxies = {
"http": "http://user:[email protected]:8080",
"https": "http://user:[email protected]:8080",
}
resp = requests.get(
"https://target-site.com/products",
impersonate="chrome124",
proxies=proxies,
timeout=30,
)
for production, do not hardcode one proxy. build a rotation pool. a minimal version:
import random
from curl_cffi import requests
PROXY_LIST = [
"http://user:pass@host1:port",
"http://user:pass@host2:port",
"http://user:pass@host3:port",
]
def get_proxies():
p = random.choice(PROXY_LIST)
return {"http": p, "https": p}
resp = requests.get(
"https://target-site.com/products",
impersonate="chrome124",
proxies=get_proxies(),
timeout=30,
)
for real volume you will want sticky sessions per domain or per job, not fully random. check your proxy provider’s documentation for session token syntax.
if it breaks: HTTP proxies work fine but some residential networks require SOCKS5. curl_cffi supports SOCKS5, use socks5://user:pass@host:port in the proxy string.
step 5: set headers that match your impersonation target
TLS fingerprint and HTTP/2 settings are matched, but if you send headers that no Chrome browser would ever send, you are still flagged. curl_cffi does not auto-set headers. you need to provide a realistic set:
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Cache-Control": "max-age=0",
"Sec-Ch-Ua": '"Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"',
"Sec-Ch-Ua-Mobile": "?0",
"Sec-Ch-Ua-Platform": '"macOS"',
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
"Sec-Fetch-User": "?1",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
}
resp = requests.get(
"https://target-site.com/products",
impersonate="chrome124",
headers=headers,
proxies=get_proxies(),
timeout=30,
)
the Sec-Ch-Ua version should match your impersonate target. Chrome 124 gets v="124".
if it breaks: some detection systems specifically check header order. curl_cffi generally preserves insertion order. if you are seeing inconsistent results, check whether you are overriding headers that curl_cffi sets internally, which can reorder the final header block.
step 6: handle sessions and cookies
most production targets require session handling. use curl_cffi’s Session object, which persists cookies across requests and reuses the same TLS connection:
from curl_cffi import requests
session = requests.Session()
# first request, might set cookies
r1 = session.get("https://target-site.com/", impersonate="chrome124", headers=headers)
# subsequent request inherits cookies
r2 = session.get("https://target-site.com/products", impersonate="chrome124", headers=headers)
for multi-job scrapers, create a new Session per domain per worker. do not share sessions across workers unless you intentionally want session continuity.
if it breaks: if the site uses JS-rendered cookies (document.cookie set by JS), curl_cffi will not execute the JS. you will need a browser automation layer, Playwright or Puppeteer, to get the initial cookies, then pass them into a curl_cffi session for the heavy lifting. this hybrid approach is common in production. the antidetect browser space has converged on similar patterns, as covered over at antidetectreview.org.
step 7: test your fingerprint before scaling
before you send a thousand requests at a target, verify your setup against a neutral fingerprint checker. tls.peet.ws/api/clean returns a JSON blob with your JA3, JA3N, HTTP/2 fingerprint, and header details. run your scraper against it, not just a manual curl, so you test the actual code path.
resp = session.get(
"https://tls.peet.ws/api/clean",
impersonate="chrome124",
headers=headers,
proxies=get_proxies(),
)
print(resp.json())
check: ja3 should match Chrome 124’s known hash. http2 should show Chrome’s SETTINGS frame. headers should not contain anything obviously non-browser.
if it breaks: if you see [akamai](https://www.akamai.com/)_fingerprint or similar fields with unexpected values, it means your HTTP/2 fingerprint is off. this usually happens when proxies strip or modify HTTP/2 frames. test without the proxy first to isolate the variable.
step 8: set up retries and error handling
production scrapers need graceful failure handling. wrap your requests:
import time
from curl_cffi import requests
from curl_cffi.requests.errors import RequestsError
def fetch(url, session, headers, max_retries=3):
for attempt in range(max_retries):
try:
resp = session.get(
url,
impersonate="chrome124",
headers=headers,
proxies=get_proxies(),
timeout=30,
)
if resp.status_code == 200:
return resp
if resp.status_code in (403, 429):
time.sleep(2 ** attempt)
except RequestsError as e:
time.sleep(2 ** attempt)
return None
log 403s and 429s separately. a spike in 403s usually means TLS detection tightened. a spike in 429s usually means rate limiting, a different problem with a different fix.
common pitfalls
using datacenter proxies with a clean TLS fingerprint. curl-impersonate fixes the fingerprint layer. it does not fix IP reputation. datacenter IPs are still trivially identified by ASN. you need residential or ISP proxies to complete the stack.
not updating the impersonate target. running chrome99 in mid-2026 is suspicious. real Chrome users are on 124 or later. fingerprint detection systems maintain browser version distribution data and flag old versions at elevated rates. update curl_cffi regularly and stay on current browser targets.
sharing sessions across workers. one session = one TLS connection stream. sharing it across concurrent goroutines or threads produces race conditions and corrupted cookie jars. each worker gets its own session.
ignoring the HTTP/2 fingerprint. operators fix the JA3 hash and stop there. Akamai Bot Manager specifically checks the HTTP/2 SETTINGS and WINDOW_UPDATE frames as a separate signal. curl-impersonate handles this, but if you are wrapping curl-impersonate in something that downgrades to HTTP/1.1, you lose that coverage. verify with tls.peet.ws that HTTP/2 is actually being used.
hardcoding headers for the wrong platform. if your Sec-Ch-Ua-Platform says "Windows" but your TLS fingerprint matches a macOS Chrome build, that inconsistency is a detection signal. keep platform signals internally consistent.
scaling this
at 10x (hundreds of requests per minute), the main work is proxy pool management. a single residential proxy endpoint with rotating IPs is usually sufficient. monitor your 4xx rate per domain.
at 100x (thousands per minute), you need concurrent sessions with proper lifecycle management. async curl_cffi via asyncio handles this. each async worker maintains its own Session. proxy pool size matters here, too few IPs and you are cycling the same addresses too fast.
import asyncio
from curl_cffi.requests import AsyncSession
async def fetch(url):
async with AsyncSession() as session:
resp = await session.get(url, impersonate="chrome124", headers=headers, proxies=get_proxies())
return resp
async def main(urls):
tasks = [fetch(url) for url in urls]
return await asyncio.gather(*tasks)
at 1000x (tens of thousands per minute), you are probably looking at distributed workers across multiple machines or containers, a proxy provider with a large residential pool (10k+ IPs minimum), and per-domain concurrency limits enforced at the job queue level. at this scale, fingerprint management is table stakes and the bottleneck usually shifts to parser throughput or downstream storage. see the rotating proxies at scale guide for infrastructure patterns at this tier.
also worth noting: at 1000x, even small per-request overhead compounds. curl_cffi’s async mode is well-optimized but profile your actual bottleneck before adding infrastructure. you can also explore multi-account scraping patterns to distribute load further, a topic covered in depth at multiaccountops.com.
where to go next
- TLS fingerprinting explained: JA3, JA3N, and Akamai HTTP/2 fingerprints covers the detection layer in detail, useful if you want to understand why specific sites are harder than others.
- Playwright stealth scraping guide is the natural follow-on if you are hitting JS-rendered cookie walls that curl_cffi alone cannot handle.
- Back to the blog index for all scraping, proxy, and infrastructure guides.
Written by Xavier Fok
disclosure: this article may contain affiliate links. if you buy through them we may earn a commission at no extra cost to you. verdicts are independent of payouts. last reviewed by Xavier Fok on 2026-05-19.