Scrapling

Q: 如何在MCP服务器会话间保持cookies持久化？

MCP 服务器不支持持久化浏览器配置文件。解决方法：直接使用 Python API，通过 `StealthySession(user_data_dir='/path/to/profile')` 实现。示例：`async with StealthySession(headless=True, user_data_dir='/path/to/profile') as session: page = await session.fetch(url)`。然后将脚本传递给您的 AI。这样可在多次调用之间保持登录状态。

框架

D4Vinci/Scrapling

自适应网页抓取框架，自动绕过反爬并移动元素。

访问仓库项目主页

项目简介

Scrapling 是一个自适应网页抓取框架，支持从单次请求到大规模爬取。其解析器能学习网站变化，在页面更新时自动重定位元素。它内置绕过 Cloudflare Turnstile 等反爬系统，并支持并发多会话爬取和代理轮换。

README 预览

\n\n\n    \n        \n          \n          \n        \n    \n    \n    Effortless Web Scraping for the Modern Web\n\n\n\n    \n    \n    العربيه | Español | Português (Brasil) | Français | Deutsch | 简体中文 | 日本語 |  Русский | 한국어\n    \n    \n        \n    \n        \n    \n    \n        \n    \n        \n    \n    \n      \n    \n    \n      \n    \n    \n    \n        \n\n\n\n    Selection methods\n    &middot;\n    Fetchers\n    &middot;\n    Spiders\n    &middot;\n    Proxy Rotation\n    &middot;\n    CLI\n    &middot;\n    MCP\n\n\nScrapling is an adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl.\n\nIts parser learns from website changes and automatically relocates your elements when pages update. Its fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box. And its spider framework lets you scale up to concurrent, multi-session crawls with pause/resume and automatic proxy rotation - all in a few lines of Python. One library, zero compromises.\n\nBlazing fast crawls with real-time stats and streaming. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.\n\n```python\nfrom scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher\nStealthyFetcher.adaptive = True\np = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True)  # Fetch website under the radar!\nproducts = p.css('.product', auto_save=True)                                        # Scrape data that survives website design changes!\nproducts = p.css('.product', adaptive=True)                                         # Later, if the website structure changes, pass `adaptive=True` to find them!\n```\nOr scale up to full crawls\n```python\nfrom scrapling.spiders import Spider, Response\n\nclass MySpider(Spider):\n  name = "demo"\n  start_urls = ["https://example.com/"]\n\n  async def parse(self, response: Response):\n      for item in response.css('.product'):\n

常见问题 (2)

故障排除

为什么ClawHub上Scrapling技能页面为空白或无法显示？

这是一次临时中断（openclaw/clawhub#2345），现已解决。页面 https://clawhub.ai/D4Vinci/scrapling-official 目前正常运行。如果您仍然看到空白页面，请尝试使用 agent-skill GitHub 目录中的 zip 文件作为备用方案：https://github.com/D4Vinci/Scrapling/tree/main/agent-skill。

来源 Issue #290

故障排除

如何在MCP服务器会话间保持cookies持久化？

MCP 服务器不支持持久化浏览器配置文件。解决方法：直接使用 Python API，通过 StealthySession(user_data_dir='/path/to/profile') 实现。示例：async with StealthySession(headless=True, user_data_dir='/path/to/profile') as session: page = await session.fetch(url)。然后将脚本传递给您的 AI。这样可在多次调用之间保持登录状态。

来源 Issue #269

Scrapling

项目简介

README 预览

常见问题 (2)

同类型项目

superpowers

everything-claude-code

flutter

langflow