← back to blog

How to scrape eBay at scale in 2026 with proxies that work

How to scrape eBay at scale in 2026 with proxies that work

eBay is one of the messiest scraping targets on the internet. it’s not a clean API, the HTML structure changes without notice, and their bot detection has gotten more aggressive since mid-2024. if you’ve tried to pull product prices, seller feedback counts, or completed auction data recently and ended up with a wall of 429s and CAPTCHA challenges, you’re not alone.

this guide is for operators who need eBay data in volume, whether that’s for price intelligence, competitor monitoring, reseller tooling, or marketplace analytics. i’ll walk through the full stack: proxy selection, request architecture, parsing, and what actually breaks at scale. i run pipelines hitting eBay US, UK, and AU; specifics here come from production, not theory.

by the end you’ll have a working Python scraper that rotates residential proxies, handles eBay’s session quirks, parses listing data reliably, and can scale from a few hundred requests per day to tens of thousands.

what you need

  • Python 3.11+ with httpx, beautifulsoup4, lxml, and playwright installed
  • rotating residential proxies, minimum a pool of 10k IPs. datacenter proxies will get blocked within a few hundred requests on eBay in 2026. i use ProxyScraping residential plans which start around $4/GB, or Brightdata’s residential network (around $8.40/GB) as a fallback
  • a scraping-friendly server in the US, preferably AWS us-east-1 or a US VPS, because eBay geo-gates some listing data and prices differ by region
  • storage: Postgres or S3-compatible object storage for raw HTML and parsed output
  • Python packages: pip install httpx[http2] beautifulsoup4 lxml playwright
  • budget: plan for roughly $15-40/month at moderate scale (10k-50k requests/day), mostly proxy egress costs

step by step

step 1: set up your proxy rotation layer

don’t put proxy credentials directly in your scraping code. build a thin rotation wrapper so you can swap providers without touching scraper logic.

import httpx
import random

PROXY_LIST = [
    "http://user:[email protected]:7777",
    # add more endpoints or load from env
]

def get_proxy():
    return random.choice(PROXY_LIST)

def make_client():
    return httpx.Client(
        proxy=get_proxy(),
        timeout=20,
        headers={
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
            "Accept-Language": "en-US,en;q=0.9",
            "Accept-Encoding": "gzip, deflate, br",
        }
    )

expected output: you can call make_client() and it returns an httpx client pre-configured with a rotating proxy.

if it breaks: if httpx throws a ProxyError, the proxy endpoint is misconfigured. double-check that your username/password are URL-encoded (special characters like @ or # in passwords will break the URL).

step 2: understand eBay’s page types

eBay has three main scraping targets, each with different structure and bot sensitivity:

  • search results (/sch/i.html?_nkw=...): medium difficulty, changes layout occasionally
  • listing pages (/itm/...): high bot sensitivity, contains structured JSON-LD
  • completed/sold listings (/sch/i.html?_nkw=...&LH_Complete=1&LH_Sold=1): highest value for price research, same structure as search

start with search pages. they’re more forgiving and give you listing IDs you can then fetch individually.

expected output: knowing which URLs to target before writing a single request.

if it breaks: if you see different page structures than documented here, eBay likely ran an A/B test. check whether your User-Agent is triggering a mobile layout by temporarily removing it.

step 3: fetch a search results page

def fetch_search(query: str, page: int = 1) -> str:
    url = "https://www.ebay.com/sch/i.html"
    params = {
        "_nkw": query,
        "_pgn": page,
        "_ipg": 60,  # 60 results per page, max
    }
    client = make_client()
    resp = client.get(url, params=params)
    resp.raise_for_status()
    return resp.text

html = fetch_search("vintage seiko watch")
print(len(html))  # should be 80k-120k characters

expected output: raw HTML string. if you’re getting ~2k characters you’ve hit a CAPTCHA or redirect page.

if it breaks: a 403 with a small response body means eBay flagged the IP. add a time.sleep(random.uniform(1.5, 4)) between requests and rotate more aggressively. for Cloudflare challenge pages, use residential proxies with a higher trust score or [playwright](https://playwright.dev/) for JS rendering to establish a session cookie.

step 4: parse listing data from search results

eBay embeds structured data in JSON-LD blocks, but it’s incomplete. the most reliable parse is a hybrid: JSON-LD for item ID and title, direct DOM for price and condition.

from bs4 import BeautifulSoup
import json
import re

def parse_search_results(html: str) -> list[dict]:
    soup = BeautifulSoup(html, "lxml")
    items = []

    for card in soup.select("li.s-item"):
        try:
            title_el = card.select_one(".s-item__title")
            price_el = card.select_one(".s-item__price")
            link_el = card.select_one("a.s-item__link")
            condition_el = card.select_one(".SECONDARY_INFO")

            if not title_el or not link_el:
                continue

            title = title_el.get_text(strip=True)
            if title == "Shop on eBay":  # skip the first phantom card
                continue

            href = link_el["href"]
            item_id_match = re.search(r"/itm/(\d+)", href)

            items.append({
                "item_id": item_id_match.group(1) if item_id_match else None,
                "title": title,
                "price": price_el.get_text(strip=True) if price_el else None,
                "condition": condition_el.get_text(strip=True) if condition_el else None,
                "url": href.split("?")[0],
            })
        except Exception:
            continue

    return items

results = parse_search_results(html)
print(results[:2])

expected output: a list of dicts with item ID, title, price string, condition, and clean URL.

if it breaks: eBay changes CSS class names a few times per year. if s-item stops working, open the page in a browser, inspect a listing card, and update the selectors. this is the most maintenance-heavy part of any eBay scraper.

step 5: fetch individual listing pages for full data

for price history, seller info, item specifics, and description, you need the listing page itself. eBay embeds a window.__PRELOADED_STATE__ JSON blob that contains most of what you need.

import json

def fetch_listing(item_id: str) -> dict:
    url = f"https://www.ebay.com/itm/{item_id}"
    client = make_client()
    resp = client.get(url)
    resp.raise_for_status()

    html = resp.text
    soup = BeautifulSoup(html, "lxml")

    # extract JSON-LD structured data
    json_ld = None
    for script in soup.find_all("script", type="application/ld+json"):
        try:
            data = json.loads(script.string)
            if isinstance(data, dict) and data.get("@type") == "Product":
                json_ld = data
                break
        except json.JSONDecodeError:
            continue

    return {
        "item_id": item_id,
        "json_ld": json_ld,
        "raw_html_length": len(html),
    }

expected output: a dict containing the JSON-LD product data and the raw HTML length as a sanity check.

if it breaks: if json_ld is None, eBay may have changed the script tag structure. fall back to parsing the <h1> for title and the [itemprop="price"] element for price.

step 6: handle pagination and rate limiting

eBay search results go up to page 100 (6000 results with 60 per page). for large queries you’ll need to either paginate or split by subcategory.

import time
import random

def scrape_query(query: str, max_pages: int = 10) -> list[dict]:
    all_items = []

    for page in range(1, max_pages + 1):
        try:
            html = fetch_search(query, page=page)
            items = parse_search_results(html)

            if not items:
                print(f"no results on page {page}, stopping")
                break

            all_items.extend(items)
            print(f"page {page}: got {len(items)} items, total {len(all_items)}")

            # randomized delay between requests, critical for staying under radar
            time.sleep(random.uniform(2, 5))

        except httpx.HTTPStatusError as e:
            if e.response.status_code == 429:
                print(f"rate limited on page {page}, sleeping 30s")
                time.sleep(30)
            else:
                raise

    return all_items

expected output: a flat list of all items across pages with progress logging.

if it breaks: consistent 429s even after sleeping suggest your proxy pool is too small or too many IPs have been flagged. switch to a fresh proxy pool or use a provider with sticky sessions to spread load.

step 7: store and deduplicate results

import sqlite3
import json

def init_db(db_path: str = "ebay_items.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS items (
            item_id TEXT PRIMARY KEY,
            title TEXT,
            price TEXT,
            condition TEXT,
            url TEXT,
            scraped_at TEXT DEFAULT (datetime('now'))
        )
    """)
    conn.commit()
    return conn

def upsert_items(conn, items: list[dict]):
    conn.executemany("""
        INSERT OR REPLACE INTO items (item_id, title, price, condition, url)
        VALUES (:item_id, :title, :price, :condition, :url)
    """, items)
    conn.commit()

expected output: a local SQLite file with deduplicated item records.

if it breaks: if you’re getting constraint errors, check that item_id is never None before upserting. filter with [i for i in items if i["item_id"]].

step 8: validate your output

before scaling up, sanity-check a sample of your data. the most common silent failure is scraping eBay’s “ghost” first card (which always says “Shop on eBay”), or capturing CAPTCHA page HTML that parses as empty results.

conn = init_db()
cursor = conn.execute("SELECT COUNT(*), COUNT(DISTINCT item_id) FROM items")
total, unique = cursor.fetchone()
print(f"total rows: {total}, unique items: {unique}")

# spot check
cursor = conn.execute("SELECT * FROM items LIMIT 5")
for row in cursor:
    print(row)

expected output: counts match and sample rows contain real product data, not placeholder text.

if it breaks: if titles are all “Shop on eBay” or prices are None across the board, your HTML selector is targeting the wrong element. run parse_search_results on a saved HTML file and inspect what the soup is returning.

common pitfalls

using datacenter proxies. datacenter IPs (AWS, DigitalOcean, Hetzner ranges) get flagged by eBay within minutes of sustained use in 2026,residential proxies are not optional. some operators also try ISP proxies (static residential), which work better than datacenter but worse than rotating residential for high-volume work.

not rotating User-Agent strings. sending the same UA on every request is a fingerprinting signal. maintain a pool of realistic Chrome and Firefox UAs from recent versions and rotate them per-session. WhatIsMyBrowser publishes current UA strings.

ignoring eBay’s region-specific endpoints. ebay.com, ebay.co.uk, ebay.com.au, and ebay.de return different prices and listings. multi-region scraping needs proxies in each target country; eBay does server-side geo-detection beyond the domain.

scraping too fast. eBay’s bot detection is session-based, not just IP-based. human-looking patterns (2-5 second gaps, occasional pauses, varying start pages) survive much longer than machine-paced crawls.

not handling eBay’s A/B tests. eBay runs constant layout experiments; a parser that worked yesterday can break silently today for 30% of requests. always log raw HTML for failed parses and check selectors periodically.

scaling this

10x (100k requests/day): the main change is moving from synchronous httpx to async. use httpx.AsyncClient with asyncio and asyncio.Semaphore to cap concurrency at 10-20 simultaneous requests. at this scale SQLite becomes a bottleneck, switch to Postgres.

100x (1M requests/day): you need a job queue (Celery with Redis, or a managed queue like AWS SQS) to distribute work across multiple scraping workers. proxy costs become your dominant expense at this scale, roughly $40-120/day depending on provider and cache hit rate. start caching listing pages that haven’t changed in 24h. if you’re running multi-account operations or managing scraper identities at this scale, multiaccountops.com/blog/ has operational guides on session management that apply directly here.

1000x (10M+ requests/day): at this volume you’re beyond what a single provider’s residential pool handles cleanly. you need geographic distribution (scraping workers in US, EU, APAC each with local proxies), a CDN-layer cache for repeat listing fetches, and real-time monitoring on your 429 rate, parse success rate, and proxy health. eBay also starts recognizing behavioral patterns at this scale even across clean IPs, so you need session warm-up logic that mimics organic browsing before hitting high-value pages. browser automation with Playwright for session cookie generation becomes necessary, not optional.

according to eBay’s developer program documentation, they do offer official APIs for some use cases including Finding and Browse APIs with rate limits in the millions of calls per day. for structured product data at scale, evaluate whether the API covers your use case before building a full scraper, it’ll be more stable.

where to go next

Written by Xavier Fok

disclosure: this article may contain affiliate links. if you buy through them we may earn a commission at no extra cost to you. verdicts are independent of payouts. last reviewed by Xavier Fok on 2026-05-19.

need infra for this today?