OpenSource-Hub

liteparse

CLI ツール

run-llama/liteparse

迅速なオープンソースドキュメント解析器で、OCRとテキスト抽出をサポートします。

概要

LiteParse は独立した PDF 解析ツールで、境界ボックスのスペーステキスト解析、柔軟な OCR システム、および複数の出力形式をサポートしています。 ローカルで実行され、Rust、Node.js、Python、および WASM を統合できます。

README プレビュー

# LiteParse\n\n[](https://github.com/run-llama/liteparse/actions/workflows/ci.yml)\n|\n[](https://crates.io/crates/liteparse)\n|\n[](https://www.npmjs.com/package/@llamaindex/liteparse)\n|\n[](https://www.npmjs.com/package/@llamaindex/liteparse-wasm)\n|\n[](https://pypi.org/project/liteparse/)\n|\n[](https://opensource.org/licenses/Apache-2.0)\n|\n[Docs](https://developers.llamaindex.ai/liteparse/)\n\n\n\n> Looking for LiteParse V1? Follow this link to [the old code](https://github.com/run-llama/liteparse/tree/logan/liteparse-v1)\n\nLiteParse is a standalone OSS PDF parsing tool focused exclusively on **fast and light** parsing. It provides high-quality spatial text parsing with bounding boxes, without proprietary LLM features or cloud dependencies. Everything runs locally on your machine.\n\n**Hitting the limits of local parsing?**\nFor complex documents (dense tables, multi-column layouts, charts, handwritten text, or\nscanned PDFs), you'll get significantly better results with [LlamaParse](https://developers.llamaindex.ai/python/cloud/llamaparse/?utm_source=github&utm_medium=liteparse),\nour cloud-based document parser built for production document pipelines. LlamaParse handles the\nhard stuff so your models see clean, structured data and markdown.\n\n>  [Sign up for LlamaParse free](https://cloud.llamaindex.ai?utm_source=github&utm_medium=liteparse)\n\n## Overview\n\n- **Fast Text Parsing**: Spatial text parsing using PDFium\n- **Flexible OCR System**:\n  - **Built-in**: Tesseract (zero setup, bundled with the library)\n  - **HTTP Servers**: Plug in any OCR server (EasyOCR, PaddleOCR, custom)\n  - **Standard API**: Simple, well-defined OCR API specification\n- **Screenshot Generation**: Generate high-quality page screenshots for LLM agents\n- **Multiple Output Formats**: JSON and Text\n- **Bounding Boxes**: Precise text positioning information\n- **Multi-language**: Use from Rust, Node.js/TypeScript, Python, or the browser (WASM)\n- **Multi-platform**: Linux, ma