← back to blog

Akamai bot manager: scraping strategies that still work

Akamai bot manager: scraping strategies that still work

Akamai protects somewhere between 30 and 40 percent of all internet traffic, and Bot Manager Premier sits on top of that as the paid enforcement layer for any site that treats bot traffic as a business problem. That means shoe drops, airline fare APIs, hotel inventory systems, and financial data feeds, among others. If you have scraped at any meaningful scale in the last three years, you have almost certainly hit an Akamai-protected endpoint and watched your requests vanish into soft 200s that return empty JSON or hard 403s with a terse challenge page. The protection has gotten materially harder since 2023, when Akamai began rolling out sensor data v2 with obfuscated payload signing. Naive Playwright scripts that worked in 2021 fail silently today.

The stakes are real. A price monitoring operation I ran across Southeast Asian travel OTAs was returning corrupted fare data for two weeks before I figured out that Akamai was serving synthetic responses, not blocking outright. That is the more dangerous failure mode: you think you are scraping, you are just getting garbage. This guide is aimed at people who already know the difference between a datacenter and residential proxy, who have used Playwright or Puppeteer in anger, and who want a clear technical picture of what Akamai is actually doing and where the attack surface genuinely is in 2026.

One hard rule before we go further: everything here applies to targets you are authorized to scrape, or where scraping is clearly legal under applicable law in your jurisdiction. I am not going to tell you to break into systems you are not permitted to access. There is enough legitimate work in price monitoring, academic research, and competitive intelligence to fill a career without that.

background and prior art

Akamai acquired Cyberfend in 2016 for its behavioral biometrics technology and folded it into what became Bot Manager. The original product was mostly IP reputation and simple JavaScript challenges. Between 2019 and 2022, they layered in device fingerprinting, behavioral scoring, and machine learning classifiers trained on billions of real user sessions flowing through their CDN. The key insight Akamai had early, and which competitors have since copied, is that a CDN operator sees enough legitimate human traffic to build ground-truth training data. You cannot fake your way past a classifier that has seen fifty million real Chrome sessions on the same endpoint.

The broader bot management market formalized around this period. Cloudflare Bot Fight Mode and DataDome both grew rapidly after 2020. The academic literature on browser fingerprinting, particularly the work coming out of KU Leuven around FPRandom and the follow-on papers on canvas and WebGL fingerprinting, seeded a lot of the detection techniques now in production use. If you want the theoretical grounding, the W3C WebDriver specification is worth reading carefully because it documents the exact navigator properties that automation frameworks expose, and Akamai’s sensor script checks many of them by name. The detection surface is not secret; it is written into the spec.

the core mechanism

Akamai Bot Manager works in three distinct layers, and understanding which layer caught you determines your remediation path.

Layer one: network and IP reputation. Every request carries an IP address, an ASN, and a TLS fingerprint. Akamai maintains blocklists of known datacenter ASNs (AWS, GCP, DigitalOcean, Hetzner, and so on) and scores requests from these ranges much more aggressively. Separately, they score based on the TLS Client Hello fingerprint, which is a hash of the cipher suites and extensions your HTTP client advertises. A raw Python requests session using the default SSL stack produces a TLS fingerprint that matches no known browser and matches many known scraping libraries. Akamai does not need to see your JavaScript behavior to reject you at this point. Tools like curl-impersonate and the tls-client Go library exist specifically to spoof this fingerprint. The TLS 1.3 RFC (RFC 8446) documents the Client Hello structure if you want to understand what is being fingerprinted.

Layer two: JavaScript sensor data collection. If your IP and TLS fingerprint pass the network layer, Akamai injects a heavily obfuscated JavaScript payload into the page. This payload, historically served from a path like /_bm/... or as an inline script with a randomized variable name, collects somewhere between 150 and 300 individual signals and packages them into a sensor data blob sent back to Akamai’s infrastructure as a POST request. The signals include: canvas rendering differences across GPU and driver versions, WebGL renderer and vendor strings, audio context fingerprinting, installed font enumeration through CSS rendering time, navigator properties (navigator.webdriver, navigator.plugins, navigator.languages), screen resolution and color depth, touch event support, and dozens of timing-based measurements. The blob is signed with a session-specific key to prevent replay.

The output of this layer is two cookies: bm_sz (the sensor data submission receipt) and _abck (the bot manager access token). Without a valid _abck cookie carrying the correct HMAC signature, subsequent requests to protected endpoints will either 403 or return poisoned responses. The cookie has a session TTL and cannot be reused across IPs without triggering the revalidation flow.

Layer three: behavioral scoring over the session. Even with valid cookies, Akamai scores ongoing session behavior. Request cadence, mouse movement entropy (if you are running a real browser), scroll depth, time-on-page, and navigation patterns all feed a session-level risk score. A session that hits 80 product pages in 4 minutes with no mouse movement and perfect 500ms intervals between requests will eventually hit a CAPTCHA wall or a soft block, even with a clean _abck. This is the layer that trips up operators who solve the initial challenge but run automation at machine speed.

The Akamai Bot Manager product documentation describes the high-level capability set. The technical implementation details are not published, but they are largely reconstructable from the sensor script itself, which you can extract and deobfuscate from any protected page.

worked examples

Example 1: retail sneaker site, ~15k SKUs, daily price sync.

A client came to me with a job scraping a major US sneaker retailer protected by Akamai Bot Manager Premier. The site had around 15,000 active SKUs and they needed daily price and inventory data. The first approach, Playwright headless on a residential proxy pool from Brightdata, was failing at roughly 30 percent on the sensor data submission. The _abck cookies we got were invalid: Akamai was rejecting the sensor payload because the headless Chrome fingerprint was being detected despite running Playwright with --disable-blink-features=AutomationControlled.

The fix was two-part. First, we switched from standard Playwright to a build of Chrome with the playwright-extra stealth plugin, which patches the most obvious navigator.webdriver leaks and normalizes plugin enumeration. Second, and more importantly, we pre-warmed sessions by running a 45-60 second idle period on the homepage with randomized mouse movement via Playwright’s page.mouse.move() with a Bezier curve path generator before hitting any product endpoints. That idle period let the Akamai sensor script collect enough behavioral signal to generate a valid _abck. Pass rate went from 70 percent to 94 percent. The remaining 6 percent were sessions where the residential proxy IP was on a per-IP blocklist, not a sensor data failure, and we handled those by rotating out flagged IPs after three consecutive sensor rejections.

Proxy cost on Brightdata residential at the time was approximately $8.40 per GB. At roughly 180KB per session including assets, a 15k-SKU daily run cost around $23 in proxy bandwidth. The browser infrastructure was Browserless.io at $89/month for their mid-tier plan. Total marginal cost: under $120/month for a dataset that was being sold at $800/month.

Example 2: airline fare API extraction.

This one was harder because the target was a European airline whose mobile app and web frontend both hit the same JSON API, and the API was Akamai-protected at the network layer before the page even loaded. There was no browser to instrument. The trick here was HTTP-layer impersonation using curl-impersonate to spoof a Chrome 124 TLS fingerprint, combined with the correct User-Agent, Accept, and sec-ch-ua headers. Because the API endpoint did not inject a JavaScript challenge (it was a mobile-optimized API path), getting the TLS fingerprint right was sufficient to pass the network layer. The _abck cookie was obtained by running a one-time headless session against the web frontend, then reusing it for API calls on the same IP.

The failure mode here is that mobile API paths sometimes have different Bot Manager configurations than the web frontend. We hit an endpoint that had stricter IP reputation scoring for non-EU IPs. Switching to a residential proxy pool routed through German exit nodes resolved it. Cost: Oxylabs residential EU pool at $10/GB, around 40KB per API call, so about $0.0004 per fare lookup.

Example 3: hotel inventory monitoring across a major OTA.

An aggregator I worked with needed real-time inventory status on about 2,000 hotels across a major OTA. The site was using Akamai in front of a GraphQL API. The GraphQL endpoint accepted POST requests with a JSON body, and Akamai was scoring based on a combination of TLS fingerprint, _abck validity, and request header ordering. Header ordering is worth calling out specifically: HTTP/2 pseudo-header order (:method, :path, :scheme, :authority) and the ordering of regular headers like Accept-Encoding, Cache-Control, and Accept are fingerprinted. Python httpx and requests both produce header orderings that do not match any real browser. We used a Go HTTP/2 client with headers explicitly ordered to match Chrome’s canonical order, documented by examining Chrome’s network requests in DevTools. Combined with a valid _abck from a pre-warmed session pool, the pass rate on the GraphQL endpoint was consistently above 97 percent.

The session pool management was the operational complexity here: we maintained a pool of 50 pre-warmed sessions, each tied to a specific residential IP, refreshed every 4 hours. A Python worker process managed session health by periodically re-running the Playwright warm-up flow and rotating sessions whose _abck had expired. Infrastructure cost for the browser fleet was around $140/month on a VPS running headless Chrome workers, plus $200/month in proxy spend. See our residential proxy cost breakdown guide for a more detailed analysis of how to model this.

edge cases and failure modes

Soft 200 poisoning. Akamai can be configured to serve syntactically valid but semantically wrong responses instead of hard blocks. You get a 200, the JSON parses, and the data is wrong. Always validate a sample of your output against ground truth (manual browser session on the same IP range) before trusting a new scraping setup at scale. If you see suspiciously uniform pricing or inventory always showing “available,” treat it as a red flag.

_abck invalidation on IP change. The _abck cookie is tied to the session IP at generation time, and Akamai validates IP continuity within a session window. If you generate a cookie on one residential IP and then hit a product endpoint from a different IP in the same proxy pool, you will get a challenge or 403. Route all requests for a given session through the same proxy exit node, or implement sticky sessions with your proxy provider. Brightdata, Oxylabs, and Smartproxy all support sticky session modes.

Obfuscation rotation breaking your sensor data solver. If you are using a third-party service to generate synthetic sensor data blobs (several exist in the gray market, typically priced at $0.002-0.005 per valid _abck), be aware that Akamai rotates the obfuscation of the sensor script periodically, sometimes weekly. When the script rotates, previously valid sensor data formats can fail silently. Your pass rate will drop gradually over 24-72 hours rather than immediately, making it easy to miss. Monitor your _abck generation success rate as a metric, not just downstream response codes.

Behavioral scoring on high-velocity sessions. Even with a valid _abck, running more than about 1 request per 8-12 seconds from a single session will eventually trigger a CAPTCHA or silent block on most high-value Akamai deployments. If you need volume, distribute across session pool width rather than increasing per-session velocity. For anti-detect browser-based session management, the antidetectreview.org/blog/ has useful rundowns on tools like Multilogin and AdsPower that are relevant for managing large session pools with distinct fingerprints.

Canvas and WebGL fingerprint mismatch. Headless Chrome running on a Linux server without a GPU will render canvas and WebGL differently than a real Windows or Mac Chrome session. Akamai’s sensor script fingerprints the specific GPU renderer string (e.g., ANGLE (Intel, Intel(R) UHD Graphics 620 Direct3D11 vs_5_0 ps_5_0, D3D11)). If your server renders with llvmpipe or a generic Mesa software renderer, the fingerprint will be anomalous. Solutions include: running Chrome with --use-gl=swiftshader to get more consistent rendering, using a remote real-device cloud (BrowserStack, LambdaTest), or using a browser profile from a real machine with a real GPU and spoofing the canvas output to match. This is complex to get right. The Chrome DevTools Protocol documentation covers the Emulation domain APIs that let you override some of these values programmatically.

Rate limits at the account or session level, not just IP level. Some Akamai configurations track behavioral patterns at the account level if you are authenticated. Logging in to a target site’s account and scraping at high velocity from that session will burn the account, sometimes permanently, even if the IP passes all bot checks. For account-level scraping, use conservative rates and ideally use separate accounts per crawler, with each account warming slowly to look like a normal user. See the related discussion in our multi-account operations guide and the resources at multiaccountops.com/blog/ for the account management angle.

what we learned in production

The single most important operational lesson is that Akamai is not a static target. Configurations change when the site’s security team adjusts thresholds, when Akamai pushes a sensor script update, or when a site upgrades from Bot Manager Standard to Bot Manager Premier. What passed last month may not pass this month. Production scraping setups against Akamai-protected targets need active monitoring: track pass rates and _abck generation success rates as first-class metrics, not afterthoughts. Set alerts on any 15-minute window where your session generation success rate drops below 85 percent. That early warning has saved me multiple times from data gaps that would have been invisible until a client asked why their dashboard showed no data.

The second lesson is about cost structure. The expensive path, buying third-party sensor data blobs from a solver service, is often cheaper in absolute dollar terms than running a self-managed Playwright fleet, but it introduces a critical dependency on a vendor that may disappear, raise prices, or get compromised. I have seen solver services that were generating valid _abck cookies using leaked sensor data from compromised real-user sessions, which is an obvious legal and ethical problem. The self-managed Playwright approach with a real residential proxy pool is more infrastructure work but gives you full visibility into what is being sent and why. For anything running at scale or handling sensitive targets, the control is worth the operational overhead. Our broader browser automation strategy guide covers more of the infrastructure architecture decisions in this space.

One practical code note: when you are managing a session pool in Python, keep your Playwright browser context alive for the duration of the session rather than creating a new context per request. Context creation triggers a fresh sensor data submission; reusing a context with a valid _abck avoids unnecessary revalidation overhead and is faster.

from playwright.async_api import async_playwright
import asyncio

async def warm_session(proxy_url: str) -> dict:
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=["--disable-blink-features=AutomationControlled"]
        )
        context = await browser.new_context(
            proxy={"server": proxy_url},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36"
        )
        page = await context.new_page()
        await page.goto("https://target.example.com/")
        # Allow sensor script to run and submit
        await asyncio.sleep(45)
        cookies = await context.cookies()
        abck = next((c for c in cookies if c["name"] == "_abck"), None)
        bm_sz = next((c for c in cookies if c["name"] == "bm_sz"), None)
        await browser.close()
        return {"_abck": abck, "bm_sz": bm_sz}

This is a minimal skeleton. In production you add the mouse movement simulation, the Bezier path generation, and session health checks before returning.

The bot management landscape is evolving fast. Akamai acquired Guardicore in 2021 and has been investing heavily in their security portfolio. New signal sources, particularly passive TLS fingerprinting improvements and HTTP/2 fingerprinting, are being rolled out continuously. Staying current requires reading the technical community, monitoring your own pass rates, and being willing to rebuild your approach when the landscape shifts.

references and further reading


Written by Xavier Fok

disclosure: this article may contain affiliate links. if you buy through them we may earn a commission at no extra cost to you. verdicts are independent of payouts. last reviewed by Xavier Fok on 2026-05-19.

need infra for this today?