OpenSource-Hub
L

llama.cpp

SHA-256
111.2k stars·AI 生产力·已提供 SHA-256 校验码,下载后可自行核对文件完整性

纯 C/C++ 的高性能大模型推理引擎,支持低比特量化与多种硬件(Apple Silicon、CUDA、Vulkan 等),轻量可嵌入。

智能下载

下载 Download 版本

vb9222 · 383.9 MB

本地运行大语言模型的最轻量引擎,不用装 PyTorch,省内存!

核心功能

  • 纯 C/C++ 实现,零依赖,可直接嵌入到各种应用中
  • 支持 1.5 至 8 比特整数量化,显存占用极低
  • 多后端:Apple Silicon、x86、NVIDIA、AMD、Vulkan、SYCL
  • 兼容数十种模型格式(GGUF),覆盖主流开源大模型
  • 提供命令行推理和 OpenAI 兼容的 API 服务器

避坑指南

  • 模型必须为 GGUF 格式,部分旧版本工具不支持最新 GGUF;2. 量化模型(尤其 2-bit 以下)会损失部分推理质量,需要根据任务平衡速度与效果;3. 首次运行时会从 Hugging Face 下载模型,需保证网络畅通。

适用场景

  • 在个人电脑上运行 7B~70B 参数的大模型,无网络延迟
  • 将 LLM 推理集成到桌面、移动或服务器软件中
  • 批量处理文本生成、翻译、摘要等任务,低成本部署

详细介绍

llama.cpp 是一个纯 C/C++ 实现的大语言模型推理引擎,不需要安装 PyTorch 或 TensorFlow 等重型框架。它原生支持 Apple Silicon、x86(AVX/AVX2/AVX512)、RISC‑V、NVIDIA(CUDA)、AMD(HIP)以及 Vulkan/SYCL 后端。核心亮点是极高效的整数量化(1.5 比特到 8 比特),大幅降低显存占用,同时保持不错的效果。它兼容数十种模型架构(如 LLaMA、Mistral、Qwen、Gemma、DeepSeek 等),并提供命令行工具 `llama-cli` 和兼容 OpenAI 的 API 服务器 `llama-server`。相比 Ollama 或 LM Studio,llama.cpp 更轻量、无后台常驻进程、无固定界面,非常适合开发者将其嵌入自己的应用或脚本中。

标签

llminferencec++quantizationggufapple-silicongpulocal-ai

快速上手

1

下载安装包

点击上方按钮下载对应系统的安装包

2

安装软件

双击下载的安装程序,按提示完成安装

3

从 GitHub Releases 下载适合你系统的预编译包,或通过 brew/nix/winget 安装

4

准备一个 GGUF 格式的模型文件(可从 Hugging Face 直接下载,如 `ggml-org/gemma-3-1b-it-GGUF`)

5

打开终端,运行 `llama-cli -m 模型路径.gguf` 开始对话;或运行 `llama-server -m 模型路径.gguf` 启动 API 服务器

安装指引
  1. 从 GitHub Releases 下载适合你系统的预编译包,或通过 brew/nix/winget 安装
  2. 准备一个 GGUF 格式的模型文件(可从 Hugging Face 直接下载,如 `ggml-org/gemma-3-1b-it-GGUF`)
  3. 打开终端,运行 `llama-cli -m 模型路径.gguf` 开始对话;或运行 `llama-server -m 模型路径.gguf` 启动 API 服务器

最新更新

<details open>

hexagon: add support for TRI op (#22822)

* Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context

* addressed PR review comments for TRI op

* hexagon: clang format

* hex-unary: remove merge conflict markers

* hex-ggml: remove duplicate op cases (merge conflict)

* hex-ggml: fix editor config errors

---------

Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>

</details>

**macOS/iOS:**

- [macOS Apple Silicon (arm64)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-macos-arm64.tar.gz)

- [macOS Apple Silicon (arm64, KleidiAI enabled)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-macos-arm64-kleidiai.tar.gz)

- [macOS Intel (x64)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-macos-x64.tar.gz)

- [iOS XCFramework](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-xcframework.zip)

**Linux:**

- [Ubuntu x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-ubuntu-x64.tar.gz)

- [Ubuntu arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-ubuntu-arm64.tar.gz)

- [Ubuntu s390x (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-ubuntu-s390x.tar.gz)

- [Ubuntu x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-ubuntu-vulkan-x64.tar.gz)

- [Ubuntu arm64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-ubuntu-vulkan-arm64.tar.gz)

- [Ubuntu x64 (ROCm 7.2)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-ubuntu-rocm-7.2-x64.tar.gz)

- [Ubuntu x64 (OpenVINO)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-ubuntu-openvino-2026.0-x64.tar.gz)

- [Ubuntu x64 (SYCL FP32)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-ubuntu-sycl-fp32-x64.tar.gz)

- [Ubuntu x64 (SYCL FP16)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-ubuntu-sycl-fp16-x64.tar.gz)

**Android:**

- [Android arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-android-arm64.tar.gz)

**Windows:**

- [Windows x64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-win-cpu-x64.zip)

- [Windows arm64 (CPU)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-win-cpu-arm64.zip)

- [Windows x64 (CUDA 12)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-win-cuda-12.4-x64.zip) - [CUDA 12.4 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b9222/cudart-llama-bin-win-cuda-12.4-x64.zip)

- [Windows x64 (CUDA 13)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-win-cuda-13.1-x64.zip) - [CUDA 13.1 DLLs](https://github.com/ggml-org/llama.cpp/releases/download/b9222/cudart-llama-bin-win-cuda-13.1-x64.zip)

- [Windows x64 (Vulkan)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-win-vulkan-x64.zip)

- [Windows x64 (SYCL)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-win-sycl-x64.zip)

- [Windows x64 (HIP)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-win-hip-radeon-x64.zip)

**openEuler:**

- [openEuler x86 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-310p-openEuler-x86.tar.gz)

- [openEuler x86 (910b, ACL Graph)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-910b-openEuler-x86-aclgraph.tar.gz)

- [openEuler aarch64 (310p)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-310p-openEuler-aarch64.tar.gz)

- [openEuler aarch64 (910b, ACL Graph)](https://github.com/ggml-org/llama.cpp/releases/download/b9222/llama-b9222-bin-910b-openEuler-aarch64-aclgraph.tar.gz)

文件完整性

已提供 SHA-256 校验码,下载后可自行核对文件完整性

该校验码提取自 GitHub 官方 Release 页面

SHA256 校验码

f96935e7e385e3b2d0189239077c10fe8fd7e95690fea4afec455b1b6c7e3f18

该校验码提取自 GitHub Release 页面,下载后请自行核对文件完整性

本平台所有 SHA-256 校验码均提取自项目在 GitHub 官方 Release 页面发布的文件,未做任何修改。你可以通过 GitHub Releases 页面自行验证。

运维指引

卸载说明

若通过 brew 安装则 `brew uninstall llama.cpp`;通过 nix 安装则 `nix profile remove llama.cpp`;手动下载的包直接删除可执行文件和 `~/.cache/llama.cpp` 缓存目录即可。

无额外依赖

下载后即可直接使用,无需安装其他运行环境

项目信息
开源协议MIT
最后更新2026-05-19T06:14:00Z
GitHub 仓库官方网站

相似推荐