Diagnosing IP bans: when it is the proxy vs when it is your fingerprint

You rotate the proxy. the block persists. you swap in a fresh residential IP, run the same request, and you are back on the 403 wall. at this point most people start spiraling, buying more proxies, trying datacenter instead of residential, switching providers mid-run. i’ve done all of that. almost none of it was the right fix because the block was never about the IP in the first place.

the split between “my IP is flagged” and “my client fingerprint is flagged” sounds obvious until you are staring at a live failure at 2am. the symptoms overlap almost completely: 403s, CAPTCHAs, silent redirects to honeypot pages, or soft failures where the site returns 200 but delivers blank content or bot-detection JSON instead of what you asked for. the stakes are real too. a misdiagnosis wastes hours and money on proxy upgrades that do nothing while the actual leak goes unfixed.

this article is for people who already understand what a proxy is, know the difference between residential and datacenter, and have shipped at least one scraping or automation project to production. i’m going to walk through the diagnostic methodology i actually use, the mechanisms behind each failure mode, and the edge cases that have burned me or people i’ve worked with.

background and prior art

IP reputation as a blocking primitive goes back to early spam filters and was formalized in tools like Spamhaus and later in CDN-level IP scoring. the idea is simple: track bad behavior to a network address, then preemptively block that address elsewhere. for scraping contexts it matters because datacenter IP ranges, Tor exit nodes, and high-density NAT ranges (shared residential proxies) all accumulate reputation over time and are sold and resold across many operators.

browser fingerprinting as a tracking mechanism predates its use as a blocking mechanism. the academic literature here is substantial. the EFF’s Cover Your Tracks project (formerly Panopticlick) has tracked fingerprint uniqueness since 2010 and remains the clearest public demonstration of how much identifying signal is embedded in a standard browser session. the leap to using fingerprints for bot detection came later, popularized by commercial products like PerimeterX (now HUMAN Security), DataDome, and Cloudflare’s Bot Management product. by 2022 these vendors had made fingerprint-based blocking the dominant pattern for high-value targets, which means IP reputation is now often the secondary check, not the primary one.

understanding this historical progression matters because it tells you where the industry’s attention is. IP blocks are table stakes. fingerprint analysis is where the interesting cat-and-mouse is happening now.

the core mechanism

let me separate the two mechanisms cleanly before showing how they interact.

IP reputation blocking works by scoring a network address against one or more signals: presence in known datacenter ASNs (Autonomous System Numbers), appearance in commercial threat-intel feeds, rate of requests per minute across the vendor’s customer network, geographic mismatch with expected user base, and historical abuse reports. when a request hits a protected site, the edge node queries this score and gates access accordingly. the block is typically assigned at the IP level, meaning a clean rotation gives you a clean slate, at least temporarily.

fingerprint-based blocking works differently. here the site is scoring the client, not the network. fingerprints are assembled from dozens of signals: the TLS handshake profile (cipher suites, extension order, supported groups), HTTP/2 frame ordering, the User-Agent header, the Accept and Accept-Language header values and their ordering, whether JavaScript is executing, the results of canvas and WebGL rendering tests, font enumeration, audio context output, navigator properties, timezone and screen geometry. these signals are combined into a hash or a model score that persists regardless of which IP the request comes from.

the practical consequence: if you rotate IPs but keep the same Playwright or Puppeteer configuration, you keep the same fingerprint. the IP changes. the fingerprint does not. the block stays.

the two mechanisms interact in layered detection systems. a typical flow looks something like this:

Request arrives
  -> Edge IP check (ASN, threat feed, rate limit)
      -> if flagged: block or CAPTCHA immediately
      -> if clean: pass to JavaScript challenge layer
          -> TLS/JA3 fingerprint check
          -> HTTP/2 fingerprint check
          -> JS execution check
              -> if bot signals: block or soft-fail
              -> if clean: serve content

this matters for diagnosis because the where in this chain tells you which fix applies. an IP block hits before JavaScript even loads. a fingerprint block often manifests after what looks like a successful connection, because the TCP and TLS handshake completed before the scoring ran.

TLS fingerprinting deserves its own paragraph because it is the most underdiagnosed vector. when your HTTP client opens a TLS connection, it sends a ClientHello message that contains a list of cipher suites and extensions in a specific order. this order is largely determined by the TLS library your client uses, not by you. Python’s [requests](https://requests.readthedocs.io/) library using the system OpenSSL has a different TLS fingerprint than a real Chrome 124 browser. tools like JA3 (from Salesforce/SFDP, now widely used) hash this ClientHello into a 32-character string. sites can block on this hash directly. RFC 9110, which governs HTTP semantics, does not dictate ClientHello construction, so every HTTP library makes its own choices, and those choices are fingerprint-visible.

HTTP/2 has its own equivalent, sometimes called JA3 for HTTP/2 or AKAMAI fingerprint, based on settings frames, window update frames, and header ordering. Chrome’s HTTP/2 implementation sends very specific frame sequences. a Python httpx client or a Go net/http client does not replicate these. if your target site is on HTTP/2 and uses this signal, you are fingerprinted before a single application-layer header is evaluated.

worked examples

example 1: the datacenter proxy that was fine, then wasn’t

a team i know was running a product monitoring job against a mid-size e-commerce site. they used a commercial datacenter proxy pool from a provider billing around $1 per GB. it worked for three months. one Monday it stopped, returning 403 on all requests across the entire pool.

first hypothesis: IP bans. they rotated to a new session prefix in the proxy pool. same result. they tried a different provider. same result. at this point the natural read is “they blocked datacenter IPs.” but when they tested with a browser on the same network, they got through fine.

the actual problem: the target site had deployed DataDome over the previous weekend. DataDome’s client-side SDK was now fingerprinting visitor sessions. the datacenter IPs were being flagged, but not purely for IP reputation. the site was also checking for JavaScript execution and getting no response, because the Python-based scraper was not running JS at all. the IP was the coincidental signal. the actual gate was the missing JS execution fingerprint.

fix: they moved to Playwright with a fingerprint-masking library (playwright-extra with the stealth plugin), paired with residential proxies. the datacenter IPs would have likely worked too if they had solved the JS execution problem, but they chose to fix both vectors at once. cost went from roughly $0.80/GB datacenter to around $8-15/GB residential, plus compute overhead for the headless browser.

example 2: residential proxies, persistent soft-fail

different project, residential proxies from a major provider, Playwright running with stealth. the site was returning 200 status codes but the content was a lightweight “verify you’re human” interstitial instead of the actual page. rotating IPs did nothing. rotating user agents did nothing.

this one took a while. the tell was that the interstitial page loaded fine on incognito Chrome with the same residential IP (confirmed by using the proxy manually). the fingerprint from Playwright was different from real Chrome in a specific way: the navigator.webdriver property was returning true in a context where stealth was supposed to suppress it. a misconfigured --disable-blink-features=AutomationControlled flag was not being applied correctly due to a version mismatch between Chrome 123 and the version of the stealth plugin expecting Chrome 120 behavior.

the diagnostic step that cracked it was loading a fingerprint testing tool via the automated browser itself, specifically BrowserLeaks, and comparing the output against a manual session. the webdriver flag leak showed up immediately. this is the most reliable manual diagnostic approach i know: if your automated browser can load a fingerprinting test page, you can see exactly what is being exposed.

fix: pinned the Chrome binary version to match the stealth plugin’s expectations, verified the webdriver flag suppression worked. blocked time cleared within one rotation cycle.

example 3: the shared residential IP problem

this one is more about IP reputation than fingerprinting. running a data pipeline using a residential proxy provider’s rotating pool. the target was a travel aggregator with aggressive abuse detection. success rate was 60-70%, which is tolerable on the surface but was generating errors in downstream processing.

the investigation revealed that the residential pool being allocated was a particularly dirty slice. checking the assigned IPs against IPQualityScore’s (IPQS) API showed fraud scores of 85-95 on most of the IPs, indicating heavy prior use for credential stuffing or ad fraud. the site’s WAF was likely subscribing to similar threat intel and pre-scoring incoming IPs.

fix: upgraded to the same provider’s “premium” residential tier, which they describe as lower-concurrency, less-shared IPs. cost went from $7/GB to $15/GB. success rate went to 95%+. the fingerprint was fine all along. the IPs were the actual problem here.

this is why you need to test both hypotheses before spending money.

edge cases and failure modes

1. the flag follows the session token, not the IP

some sites issue a session cookie or a browser fingerprint token on first visit, then bind that token to a reputation score. if your session token is flagged, rotating the IP does nothing because the token is presented in every subsequent request. this is common on sites using Cloudflare’s bot management with “Super Bot Fight Mode.” the fix is to regenerate the session entirely (new browser context, cleared cookies and storage), not to rotate the IP. this is the most common failure mode i see people misdiagnose.

2. TLS fingerprint fixed, HTTP/2 fingerprint missed

you’ve done the work to spoof the TLS JA3 fingerprint using a library like curl-impersonate or tls-client (the Python port). you’re mimicking Chrome’s TLS handshake. but if the target is on HTTP/2 and you haven’t replicated Chrome’s HTTP/2 framing, you have solved half the problem. the AKAMAI fingerprint, or variants used by Cloudflare, will still flag you. check whether the target is HTTP/2 with curl -I --http2 https://target.com before assuming TLS spoofing is sufficient.

3. the honeypot page problem

some sites serve a realistic-looking page to detected bots, one that returns 200, has plausible HTML structure, but is actually bot bait. your scraper happily extracts “data” that is meaningless or intentionally wrong. this is a soft failure that is easy to miss if you’re only monitoring status codes. i’ve seen teams run pipelines for weeks on poisoned data before someone noticed the pricing data looked wrong. the countermeasure is to validate extracted data against known-good reference points, not just check HTTP status.

4. rate limits that look like blocks

a 429 is a rate limit, not a ban. a 503 can be a rate limit. even a 403 can be a transient rate limit on some platforms. many operators treat any non-200 as a proxy problem and start rotating. the diagnostic check: back off completely for 30 minutes and retry from the same IP. if it works, you hit a rate limit. if it doesn’t, you have a ban or a fingerprint issue. this distinction also matters for how you structure retry logic. exponential backoff is appropriate for rate limits. IP rotation is appropriate for bans. applying rotation to a rate limit scenario just burns through your pool faster.

5. the ASN-level block

some sites block entire ASNs, not individual IPs. if your “residential” proxy traffic is actually running through a hosting provider’s infrastructure (some residential providers route through VPS infrastructure for certain regions), the ASN will be classified as hosting, not residential. you can verify this with tools like ipinfo.io by checking the org field against the IP you are assigned. if it returns a cloud hosting company rather than an ISP, you have your explanation. the fix is to use a provider with genuine ISP-originated residential traffic, which is worth paying for if the target is checking ASN classification.

what we learned in production

the diagnostic framework i use now is a decision tree, not a gut call. when a block appears: first, confirm it persists across at least three different IPs from different subnets. if it clears on the second IP, it was a single-IP ban, likely transient. if it persists, test with a manual browser session on the same proxied IP. if the manual session works and the automated one doesn’t, the IP is clean and the fingerprint is the problem. if both fail, the IP is likely blocked or the ASN is blocked, and you need to address the network layer first, then retest the fingerprint.

the tooling around fingerprint inspection has improved a lot. beyond BrowserLeaks, coveryourtracks.eff.org is useful for understanding how much entropy your automated browser is leaking relative to a real user population. running your automated browser against these tools as part of your QA process, before running it against any target, catches most configuration leaks early. we now treat this as a pre-flight check. the MDN documentation on HTTP headers is also worth reading carefully for understanding which headers are evaluated and what their expected values and ordering look like for real browsers.

one other production note: fingerprinting vendors update their detection models. what worked in Q3 2025 against a specific DataDome version may not work in Q1 2026. this means the fingerprint layer requires ongoing maintenance, not a one-time fix. we schedule quarterly reviews of our stealth configurations and validate them against target sites in a staging environment. the IP layer is more stable, but provider pool quality degrades over time too, particularly for shared residential. budget for periodic pool quality checks as operational overhead.

if you’re working in the airdrop farming or multi-account space, the same principles apply with even higher stakes since account bans are harder to recover from than scraping blocks. the community at airdropfarming.org/blog/ covers the antidetect browser side of fingerprint management in detail and is worth cross-referencing. for a systematic view of which antidetect tools actually isolate fingerprints correctly, antidetectreview.org/blog/ maintains current coverage.

the broader point is that thinking in terms of “fix the proxy” or “fix the fingerprint” as two isolated problems is the wrong mental model. detection systems layer these signals together. an IP with a bad reputation gets scrutinized harder at the fingerprint layer. a fingerprint that matches a known automation tool pattern elevates the IP’s score in the vendor’s model. the diagnostic work is about isolating which layer is the gate, because that determines which fix is correct. getting that right the first time is worth more than any amount of proxy spend.

for related reading on this site, see the proxy provider comparison guide for a breakdown of pool quality across residential providers, the TLS fingerprinting explainer for a deeper technical treatment of JA3 and AKAMAI signatures, and the DataDome bypass analysis for a case study on one of the most widely deployed bot management platforms. all three are prerequisites for the more advanced evasion work.

references and further reading

Written by Xavier Fok

disclosure: this article may contain affiliate links. if you buy through them we may earn a commission at no extra cost to you. verdicts are independent of payouts. last reviewed by Xavier Fok on 2026-05-19.