OpenSource-Hub

liteparse

CLI 도구

run-llama/liteparse

빠른 오픈 소스 문서 해독기, OCR 및 텍스트 추출을 지원합니다.

개요

LiteParse는 독립적 인 PDF 솔루션 도구로 경계 상자와 공간 텍스트 솔루션, 유연한 OCR 시스템 및 다양한 출력 포맷을 지원합니다. Rust, Node.js, Python 및 WASM을 통합하여 로컬로 실행됩니다.

README 미리보기

# LiteParse\n\n[](https://github.com/run-llama/liteparse/actions/workflows/ci.yml)\n|\n[](https://crates.io/crates/liteparse)\n|\n[](https://www.npmjs.com/package/@llamaindex/liteparse)\n|\n[](https://www.npmjs.com/package/@llamaindex/liteparse-wasm)\n|\n[](https://pypi.org/project/liteparse/)\n|\n[](https://opensource.org/licenses/Apache-2.0)\n|\n[Docs](https://developers.llamaindex.ai/liteparse/)\n\n\n\n> Looking for LiteParse V1? Follow this link to [the old code](https://github.com/run-llama/liteparse/tree/logan/liteparse-v1)\n\nLiteParse is a standalone OSS PDF parsing tool focused exclusively on **fast and light** parsing. It provides high-quality spatial text parsing with bounding boxes, without proprietary LLM features or cloud dependencies. Everything runs locally on your machine.\n\n**Hitting the limits of local parsing?**\nFor complex documents (dense tables, multi-column layouts, charts, handwritten text, or\nscanned PDFs), you'll get significantly better results with [LlamaParse](https://developers.llamaindex.ai/python/cloud/llamaparse/?utm_source=github&utm_medium=liteparse),\nour cloud-based document parser built for production document pipelines. LlamaParse handles the\nhard stuff so your models see clean, structured data and markdown.\n\n>  [Sign up for LlamaParse free](https://cloud.llamaindex.ai?utm_source=github&utm_medium=liteparse)\n\n## Overview\n\n- **Fast Text Parsing**: Spatial text parsing using PDFium\n- **Flexible OCR System**:\n  - **Built-in**: Tesseract (zero setup, bundled with the library)\n  - **HTTP Servers**: Plug in any OCR server (EasyOCR, PaddleOCR, custom)\n  - **Standard API**: Simple, well-defined OCR API specification\n- **Screenshot Generation**: Generate high-quality page screenshots for LLM agents\n- **Multiple Output Formats**: JSON and Text\n- **Bounding Boxes**: Precise text positioning information\n- **Multi-language**: Use from Rust, Node.js/TypeScript, Python, or the browser (WASM)\n- **Multi-platform**: Linux, ma