- Published on
反爬虫方案
- Authors

- Name
- JiGu
- @crypto20x
Current anti-scraping techniques generally fall into two main categories: Browser Layer (Application Layer) and Network Layer (TLS/TCP Fingerprinting).
Here are the best methods and open-source libraries on GitHub as of 2025:
1. Browser Automation (Solving JS Challenges & Captchas)
Used when sites require full browser rendering (e.g., Cloudflare Turnstile, ReCaptcha, heavily dynamic content).
- DrissionPage (Python) - Currently Recommended
- Why: It combines browser automation (like Selenium/Playwright) with direct packet sending. It controls the browser via the CDP protocol directly, avoiding many WebDriver detection vectors.
- GitHub:
g1879/DrissionPage
- Undetected Chromedriver / Nodriver (Python)
- Why: Patches the binary of the Chromedriver to remove "leaks" that reveal automation (like the
cdc_variable).nodriveris the newer, successor project. - GitHub:
ultrafunkamsterdam/undetected-chromedriverorultrafunkamsterdam/nodriver
- Why: Patches the binary of the Chromedriver to remove "leaks" that reveal automation (like the
- Puppeteer Extra Plugin Stealth (Node.js)
- Why: The standard for Node.js scraping. It patches browser properties (like
navigator.webdriver) to look human. - GitHub:
berstend/puppeteer-extra
- Why: The standard for Node.js scraping. It patches browser properties (like
- FlareSolverr (Docker/Proxy)
- Why: A proxy server that automatically launches a browser to solve Cloudflare challenges and then returns the cookies to your script. Great for easy integration.
- GitHub:
FlareSolverr/FlareSolverr
2. High-Performance Requests (TLS Fingerprinting)
Used for fast, large-scale scraping without a browser. Standard libraries like Python's requests or Node's axios are easily blocked because their TLS handshake fingerprint (JA3/JA4) is distinct from a real Chrome/Safari browser.
- Curl Impersonate & Curl CFFI (C / Python / Node) - Best for Speed
- Why: It modifies the low-level
curllibrary to mimic the exact TLS handshake of Chrome, Firefox, or Safari.curl_cffiis the Python binding. - GitHub:
yifeikong/curl_cffi(Python) orlwthiker/curl-impersonate(Core)
- Why: It modifies the low-level
- TLS Client (Go / Python)
- Why: A library specifically designed to spoof TLS fingerprints (JA3).
- GitHub:
bogdanfinn/tls-client
Summary Recommendation
| Scenario | Recommended Tool | GitHub Keyword |
|---|---|---|
| Complex Logins / Captchas | DrissionPage (Python) or Nodriver | g1879/DrissionPage |
| High Speed / API Scraping | curl_cffi (Python) | yifeikong/curl_cffi |
| Cloudflare "Under Attack" | FlareSolverr | FlareSolverr/FlareSolverr |
| Node.js Environment | Puppeteer Stealth | berstend/puppeteer-extra |
If you need to simulate a login specifically for a Crawler, I recommend using DrissionPage or Puppeteer with Stealth to handle the initial login interactively, save the cookies, and then pass those cookies to a faster HTTP client (like curl_cffi) for the actual data scraping.