项目简介
Scrapling 是一个自适应网页抓取框架,支持从单次请求到大规模爬取。其解析器能学习网站变化,在页面更新时自动重定位元素。它内置绕过 Cloudflare Turnstile 等反爬系统,并支持并发多会话爬取和代理轮换。
README 预览
\n\n\n \n \n \n \n \n \n \n Effortless Web Scraping for the Modern Web\n\n\n\n \n \n العربيه | Español | Português (Brasil) | Français | Deutsch | 简体中文 | 日本語 | Русский | 한국어\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n Selection methods\n ·\n Fetchers\n ·\n Spiders\n ·\n Proxy Rotation\n ·\n CLI\n ·\n MCP\n\n\nScrapling is an adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl.\n\nIts parser learns from website changes and automatically relocates your elements when pages update. Its fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box. And its spider framework lets you scale up to concurrent, multi-session crawls with pause/resume and automatic proxy rotation - all in a few lines of Python. One library, zero compromises.\n\nBlazing fast crawls with real-time stats and streaming. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.\n\n```python\nfrom scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher\nStealthyFetcher.adaptive = True\np = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True) # Fetch website under the radar!\nproducts = p.css('.product', auto_save=True) # Scrape data that survives website design changes!\nproducts = p.css('.product', adaptive=True) # Later, if the website structure changes, pass `adaptive=True` to find them!\n```\nOr scale up to full crawls\n```python\nfrom scrapling.spiders import Spider, Response\n\nclass MySpider(Spider):\n name = "demo"\n start_urls = ["https://example.com/"]\n\n async def parse(self, response: Response):\n for item in response.css('.product'):\n