OpenSource-Hub

airllm

라이브러리

lyogavin/airllm

4GB 디스플레이 카드에서 70B 대형 언어 모델을 실행할 필요가 없습니다.

개요

AirLLM은 추론 메모리 사용을 최적화하여 70B와 같은 대형 언어 모델을 단일 4GB 그래픽 카드에서 수량화, 증류 또는 분쇄없이 실행할 수 있습니다. Llama 3.1 405B를 8GB 디스플레이로 배포하고 분량화 가속도를 제공합니다.

README 미리보기

\n\n[**Quickstart**](#quickstart) | \n[**Configurations**](#configurations) | \n[**MacOS**](#macos) | \n[**Example notebooks**](#example-python-notebook) | \n[**FAQ**](#faq)\n\n**AirLLM** optimizes inference memory usage, allowing 70B large language models to run inference on a single 4GB GPU card without quantization, distillation and pruning. And you can run **405B Llama3.1** on **8GB vram** now.\n\n\n[](https://pepy.tech/project/airllm)\n\n[](https://github.com/LianjiaTech/BELLE/blob/main/LICENSE)\n[](https://static.aicompose.cn/static/wecom_barcode.png?t=1671918938)\n[](https://discord.gg/2xffU5sn)\n[\n](https://pypi.org/project/airllm/)\n[](https://medium.com/@lyo.gavin)\n[](https://gavinliblog.com)\n[](https://patreon.com/gavinli)\n[](https://github.com/sponsors/lyogavin)\n\n## AI Agents Recommendation:\n\n* [Best AI Game Sprite Generator](https://godmodeai.co)\n\n* [Best AI Facial Expression Editor](https://crazyfaceai.com)\n\n## Updates\n[2024/08/20] v2.11.0: Support Qwen2.5\n\n[2024/08/18] v2.10.1 Support CPU inference. Support non sharded models. Thanks @NavodPeiris for the great work! \n\n[2024/07/30] Support Llama3.1 **405B** ([example notebook](https://colab.research.google.com/github/lyogavin/airllm/blob/main/air_llm/examples/run_llama3.1_405B.ipynb)). Support **8bit/4bit quantization**.\n\n[2024/04/20] AirLLM supports Llama3 natively already. Run Llama3 70B on 4GB single GPU.\n\n[2023/12/25] v2.8.2: Support MacOS running 70B large language models.\n\n[2023/12/20] v2.7: Support AirLLMMixtral. \n\n[2023/12/20] v2.6: Added AutoModel, automatically detect model type, no need to provide model class to initialize model.\n\n[2023/12/18] v2.5: added prefetching to overlap the model loading and compute. 10% speed improvement.\n\n[2023/12/03] added support of **ChatGLM**, **QWen**, **Baichuan**, **Mistral**, **InternLM**!\n\n[2023/12/02] added support for safetensors. Now support all top 10 models in open llm leaderboard.\n\n[2023/12/01] airllm 2.0. Support com