airllm

라이브러리

lyogavin/airllm

4GB VRAM 단일 GPU에서 70B 대규모 언어 모델을 양자화 없이 실행합니다

개요

AirLLM이 추론 메모리 사용을 최적화하여 70B 등 대규모 언어 모델을 양자화, 증류 또는 가지치기 없이 단일 4GB GPU에서 실행할 수 있게 합니다. Llama 3.1 405B를 8GB VRAM에 배포하는 것을 지원하며, 블록 단위 양자화 가속을 제공합니다.

README 미리보기

\n\n[**Quickstart**](#quickstart) | \n[**Configurations**](#configurations) | \n[**MacOS**](#macos) | \n[**Example notebooks**](#example-python-notebook) | \n[**FAQ**](#faq)\n\n**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run **405B Llama3.1** on **8GB vram** now.\n\n\n[](https://pepy.tech/project/airllm)\n\n[](https://github.com/LianjiaTech/BELLE/blob/main/LICENSE)\n[](https://static.aicompose.cn/static/wecom_barcode.png?t=1671918938)\n[](https://discord.gg/2xffU5sn)\n[\n](https://pypi.org/project/airllm/)\n[](https://medium.com/@lyo.gavin)\n[](https://gavinliblog.com)\n[](https://patreon.com/gavinli)\n[](https://github.com/sponsors/lyogavin)\n\n## AI Agents Recommendation:\n\n* [Best AI Game Sprite Generator](https://godmodeai.co)\n\n* [Best AI Facial Expression Editor](https://crazyfaceai.com)\n\n## Updates\n[2024/08/20] v2.11.0: Support Qwen2.5\n\n[2024/08/18] v2.10.1 Support CPU inference. Support non sharded models. Thanks @NavodPeiris for the great work! \n\n[2024/07/30] Support Llama3.1 **405B** ([example notebook](https://colab.research.google.com/github/lyogavin/airllm/blob/main/air_llm/examples/run_llama3.1_405B.ipynb)). Support **8bit/4bit quantization**.\n\n[2024/04/20] AirLLM supports Llama3 natively already. Run Llama3 70B on 4GB single GPU.\n\n[2023/12/25] v2.8.2: Support MacOS running 70B large language models.\n\n[2023/12/20] v2.7: Support AirLLMMixtral. \n\n[2023/12/20] v2.6: Added AutoModel, automatically detect model type, no need to provide model class to initialize model.\n\n[2023/12/18] v2.5: added prefetching to overlap the model loading and compute. 10% speed improvement.\n\n[2023/12/03] added support of **ChatGLM**, **QWen**, **Baichuan**, **Mistral**, **InternLM**!\n\n[2023/12/02] added support for safetensors. Now support all top 10 models in open llm leaderboard.\n\n[2023/12/01] airllm 2.0. Support com

FAQ (1)

문제 해결

Mac에서 airllm으로 llama 모델을 실행할 때 'ValueError: Cannot index mlx array using the given type' 오류를 해결하는 방법은 무엇인가요?

이 오류는 Apple Silicon에서 MLX 백엔드와 함께 airllm을 사용할 때 발생합니다. 모델이 MLX 배열을 기대하지만 입력 토큰 ID가 PyTorch 텐서로 제공되기 때문입니다. 해결하려면 입력 토큰을 변환하세요: import mlx.core as mx; generation_output = model.generate(mx.array(input_tokens['input_ids'])) 대신 input_tokens['input_ids'].cuda(). 이렇게 하면 MLX의 임베딩 레이어와의 호환성이 보장됩니다. 이 문제는 M1 Pro의 macOS 14.5에서 airllm v2.9.1, Python 3.12.4, mlx 버전 14.1–16.1에서 보고되었습니다.

원본 Issue #167

airllm

개요

README 미리보기

FAQ (1)

同类型项目

puppeteer

PaddleOCR

crawl4ai

prisma