项目简介
LiteParse 是一款独立 PDF 解析工具,提供带边界框的空间文本解析、灵活 OCR 系统,并支持多种输出格式。它本地运行,可集成 Rust、Node.js、Python 及 WASM。
README 预览
# LiteParse\n\n[](https://github.com/run-llama/liteparse/actions/workflows/ci.yml)\n|\n[](https://crates.io/crates/liteparse)\n|\n[](https://www.npmjs.com/package/@llamaindex/liteparse)\n|\n[](https://www.npmjs.com/package/@llamaindex/liteparse-wasm)\n|\n[](https://pypi.org/project/liteparse/)\n|\n[](https://opensource.org/licenses/Apache-2.0)\n|\n[Docs](https://developers.llamaindex.ai/liteparse/)\n\n\n\n> Looking for LiteParse V1? Follow this link to [the old code](https://github.com/run-llama/liteparse/tree/logan/liteparse-v1)\n\nLiteParse is a standalone OSS PDF parsing tool focused exclusively on **fast and light** parsing. It provides high-quality spatial text parsing with bounding boxes, without proprietary LLM features or cloud dependencies. Everything runs locally on your machine.\n\n**Hitting the limits of local parsing?**\nFor complex documents (dense tables, multi-column layouts, charts, handwritten text, or\nscanned PDFs), you'll get significantly better results with [LlamaParse](https://developers.llamaindex.ai/python/cloud/llamaparse/?utm_source=github&utm_medium=liteparse),\nour cloud-based document parser built for production document pipelines. LlamaParse handles the\nhard stuff so your models see clean, structured data and markdown.\n\n> [Sign up for LlamaParse free](https://cloud.llamaindex.ai?utm_source=github&utm_medium=liteparse)\n\n## Overview\n\n- **Fast Text Parsing**: Spatial text parsing using PDFium\n- **Flexible OCR System**:\n - **Built-in**: Tesseract (zero setup, bundled with the library)\n - **HTTP Servers**: Plug in any OCR server (EasyOCR, PaddleOCR, custom)\n - **Standard API**: Simple, well-defined OCR API specification\n- **Screenshot Generation**: Generate high-quality page screenshots for LLM agents\n- **Multiple Output Formats**: JSON and Text\n- **Bounding Boxes**: Precise text positioning information\n- **Multi-language**: Use from Rust, Node.js/TypeScript, Python, or the browser (WASM)\n- **Multi-platform**: Linux, ma