OpenSource-Hub

airllm

lyogavin/airllm

在 4GB 显存单卡上运行 70B 大语言模型,无需量化。

项目简介

AirLLM 优化推理内存使用,使得 70B 等大语言模型可以在单张 4GB 显卡上运行,无需量化、蒸馏或剪枝。支持将 Llama 3.1 405B 部署到 8GB 显存,并提供分块量化加速。

README 预览

\n\n[**Quickstart**](#quickstart) | \n[**Configurations**](#configurations) | \n[**MacOS**](#macos) | \n[**Example notebooks**](#example-python-notebook) | \n[**FAQ**](#faq)\n\n**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run **405B Llama3.1** on **8GB vram** now.\n\n\n[](https://pepy.tech/project/airllm)\n\n[](https://github.com/LianjiaTech/BELLE/blob/main/LICENSE)\n[](https://static.aicompose.cn/static/wecom_barcode.png?t=1671918938)\n[](https://discord.gg/2xffU5sn)\n[\n](https://pypi.org/project/airllm/)\n[](https://medium.com/@lyo.gavin)\n[](https://gavinliblog.com)\n[](https://patreon.com/gavinli)\n[](https://github.com/sponsors/lyogavin)\n\n## AI Agents Recommendation:\n\n* [Best AI Game Sprite Generator](https://godmodeai.co)\n\n* [Best AI Facial Expression Editor](https://crazyfaceai.com)\n\n## Updates\n[2024/08/20] v2.11.0: Support Qwen2.5\n\n[2024/08/18] v2.10.1 Support CPU inference. Support non sharded models. Thanks @NavodPeiris for the great work! \n\n[2024/07/30] Support Llama3.1 **405B** ([example notebook](https://colab.research.google.com/github/lyogavin/airllm/blob/main/air_llm/examples/run_llama3.1_405B.ipynb)). Support **8bit/4bit quantization**.\n\n[2024/04/20] AirLLM supports Llama3 natively already. Run Llama3 70B on 4GB single GPU.\n\n[2023/12/25] v2.8.2: Support MacOS running 70B large language models.\n\n[2023/12/20] v2.7: Support AirLLMMixtral. \n\n[2023/12/20] v2.6: Added AutoModel, automatically detect model type, no need to provide model class to initialize model.\n\n[2023/12/18] v2.5: added prefetching to overlap the model loading and compute. 10% speed improvement.\n\n[2023/12/03] added support of **ChatGLM**, **QWen**, **Baichuan**, **Mistral**, **InternLM**!\n\n[2023/12/02] added support for safetensors. Now support all top 10 models in open llm leaderboard.\n\n[2023/12/01] airllm 2.0. Support com