LocalAI

SHA-256

46.2k stars·AI Productivity·SHA-256 checksum verified

LocalAI is the open-source AI engine to run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required. Drop-in API compatibility with OpenAI, Anthropic, and ElevenLabs.

Smart Download

Download Download Version

v4.2.2 · 130.5 MB

Run any AI model on any hardware locally. No GPU needed, drop-in API replacement for OpenAI.

Core Features

No GPU needed: runs on CPU, Apple Silicon, AMD/Intel/Vulkan GPUs, and more
Drop-in API replacement: fully compatible with OpenAI, Anthropic, ElevenLabs - zero code change
36+ backends: llama.cpp, vLLM, transformers, whisper, diffusers, MLX, etc.
Multi-user ready: API key auth, user quotas, role-based access control
Built-in AI agents: tool use, RAG, MCP protocol for autonomous agents

What It Can't Do

•macOS DMG is not signed by Apple; first run requires quarantine attribute removal. Docker GPU acceleration requires proper driver installation and device mapping. Model downloads are large (several GB); ensure stable internet. Some backends (e.g. vLLM) need modern hardware.

Use Cases

Self-hosting LLM chatbots as a drop-in replacement for OpenAI API
Running speech recognition or image generation on edge devices like Raspberry Pi
Building a team AI platform with user permissions and quota management
Local AI development and testing without internet or API costs

Detailed Introduction

LocalAI is a free, open-source AI engine that lets you run large language models, image generators, voice assistants, and more on your own hardware — even without a GPU. It provides a drop-in replacement for the OpenAI API, so you can switch from cloud services to local hosting with zero code changes. With 36+ backends (llama.cpp, vLLM, transformers, whisper, diffusers, MLX, etc.), it supports NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or pure CPU. Built-in multi-user authentication, role-based access, and AI agents with tool use, RAG, and MCP enable enterprise-grade deployments. All data stays on your infrastructure, ensuring complete privacy.

Troubleshooting & FAQ (2)

Troubleshooting

How to fix 'reasoning_effort=none' not working in LocalAI 4.3.4?

This is a known regression in LocalAI versions after 4.0.0 (issue #10072). The parameter reasoning_effort=none should prevent the model from producing reasoning tokens and speed up responses, but a bug in newer versions causes it to be ignored. As a temporary workaround, downgrade to LocalAI v4.0.0 or v3.12.1, where the feature was reported to function correctly with llama-cpp backend models like Qwen3. If downgrading is not possible, you can also try forcing the model to skip reasoning by setting top_p=0 and temperature=0, or using a non-reasoning model for latency-sensitive tasks. For a permanent fix, monitor the GitHub issue #10072 and upgrade once the patch is released. Ensure your model configuration file correctly maps the reasoning_effort option to the backend parameter (e.g., in llama-cpp, it should map to --reasoning-effort none).

GitHub Issue #10072

Troubleshooting

Why are some LocalAI v4.3.2 Docker images missing from Docker Hub?

A CI build failure prevented publishing of several v4.3.2 tags. Affected missing tags: v4.3.2, v4.3.2-gpu-nvidia-cuda-12, v4.3.2-gpu-nvidia-cuda-13, v4.3.2-gpu-vulkan, v4.3.2-gpu-intel. Successfully published tags: v4.3.2-gpu-hipblas, v4.3.2-nvidia-l4t-arm64, v4.3.2-nvidia-l4t-arm64-cuda-13. As a workaround, use the localai/localai:master image.

GitHub Issue #10041

Getting Started

Download installer

Click the button above to download the installer for your system

Linux· 130.5 MB Windows· 138.2 MB macOS· 11.7 MB

Install the software

Install the appropriate package for your distro (dpkg / rpm / AppImage)

macOS: Download the DMG, drag to Applications. First launch may require: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app

Docker (CPU): docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

Docker (NVIDIA GPU): add --gpus all e.g. docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

Install Guide

macOS: Download the DMG, drag to Applications. First launch may require: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app
Docker (CPU): docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
Docker (NVIDIA GPU): add --gpus all e.g. docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

File Integrity

SHA-256 checksum verified

Checksum extracted from GitHub official Release page

SHA256 Checksum

544eb221c2a5ec84467c1eb92851d98348c5e8eec9bf0346bd942e302faad73b

This checksum is extracted from the GitHub Release page. Verify file integrity after download.

All SHA-256 checksums on this platform are extracted from the project's official GitHub Release page, without any modification. You can independently verify them on the GitHub Releases page.

Open Source Transparency

View GitHub Source

Environment Guide

Uninstall Info

macOS: Move LocalAI.app to Trash and empty it. Docker: docker stop local-ai; docker rm local-ai; docker rmi localai/localai:latest and related tags.

No Extra Dependencies

Ready to use after download. No additional runtime required.

Project Info

LicenseMIT

Last Updated2026-06-26 06:55:08

GitHub Repository Official Website

Having issues? Check the FAQ below

2 FAQs

Similar Projects

daily_stock_analysis

An open-source AI stock analysis system for A/H/US markets that generates daily decision dashboards and pushes them to WeChat Work, Feishu, Telegram, Discord, Slack, or email. Deploy via GitHub Actions for free.

ollama

Ollama lets you download, run, and manage large language models locally. One command, multiple platforms, endless possibilities.

llama.cpp

High-performance LLM inference engine in C/C++ with minimal dependencies, supporting quantized models (1.5–8 bit) and diverse hardware (Apple Silicon, CUDA, Vulkan, etc.).