LocalAI
SHA-256LocalAI is the open-source AI engine to run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required. Drop-in API compatibility with OpenAI, Anthropic, and ElevenLabs.
Smart Download
Download Download Version
v4.2.2 · 130.5 MB
Run any AI model on any hardware locally. No GPU needed, drop-in API replacement for OpenAI.
Core Features
- No GPU needed: runs on CPU, Apple Silicon, AMD/Intel/Vulkan GPUs, and more
- Drop-in API replacement: fully compatible with OpenAI, Anthropic, ElevenLabs - zero code change
- 36+ backends: llama.cpp, vLLM, transformers, whisper, diffusers, MLX, etc.
- Multi-user ready: API key auth, user quotas, role-based access control
- Built-in AI agents: tool use, RAG, MCP protocol for autonomous agents
What It Can't Do
- •macOS DMG is not signed by Apple; first run requires quarantine attribute removal. Docker GPU acceleration requires proper driver installation and device mapping. Model downloads are large (several GB); ensure stable internet. Some backends (e.g. vLLM) need modern hardware.
Use Cases
- Self-hosting LLM chatbots as a drop-in replacement for OpenAI API
- Running speech recognition or image generation on edge devices like Raspberry Pi
- Building a team AI platform with user permissions and quota management
- Local AI development and testing without internet or API costs
Detailed Introduction
LocalAI is a free, open-source AI engine that lets you run large language models, image generators, voice assistants, and more on your own hardware — even without a GPU. It provides a drop-in replacement for the OpenAI API, so you can switch from cloud services to local hosting with zero code changes. With 36+ backends (llama.cpp, vLLM, transformers, whisper, diffusers, MLX, etc.), it supports NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or pure CPU. Built-in multi-user authentication, role-based access, and AI agents with tool use, RAG, and MCP enable enterprise-grade deployments. All data stays on your infrastructure, ensuring complete privacy.
Troubleshooting & FAQ (2)
TroubleshootingHow to fix 'reasoning_effort=none' not working in LocalAI 4.3.4?
This is a known regression in LocalAI versions after 4.0.0 (issue #10072). The parameter reasoning_effort=none should prevent the model from producing reasoning tokens and speed up responses, but a bug in newer versions causes it to be ignored. As a temporary workaround, downgrade to LocalAI v4.0.0 or v3.12.1, where the feature was reported to function correctly with llama-cpp backend models like Qwen3. If downgrading is not possible, you can also try forcing the model to skip reasoning by setting top_p=0 and temperature=0, or using a non-reasoning model for latency-sensitive tasks. For a permanent fix, monitor the GitHub issue #10072 and upgrade once the patch is released. Ensure your model configuration file correctly maps the reasoning_effort option to the backend parameter (e.g., in llama-cpp, it should map to --reasoning-effort none).
TroubleshootingWhy are some LocalAI v4.3.2 Docker images missing from Docker Hub?
A CI build failure prevented publishing of several v4.3.2 tags. Affected missing tags: v4.3.2, v4.3.2-gpu-nvidia-cuda-12, v4.3.2-gpu-nvidia-cuda-13, v4.3.2-gpu-vulkan, v4.3.2-gpu-intel. Successfully published tags: v4.3.2-gpu-hipblas, v4.3.2-nvidia-l4t-arm64, v4.3.2-nvidia-l4t-arm64-cuda-13. As a workaround, use the localai/localai:master image.
Tags
Getting Started
Download installer
Click the button above to download the installer for your system
Install the software
Install the appropriate package for your distro (dpkg / rpm / AppImage)
macOS: Download the DMG, drag to Applications. First launch may require: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app
Docker (CPU): docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
Docker (NVIDIA GPU): add --gpus all e.g. docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
- macOS: Download the DMG, drag to Applications. First launch may require: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app
- Docker (CPU): docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
- Docker (NVIDIA GPU): add --gpus all e.g. docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
SHA-256 checksum verified
Checksum extracted from GitHub official Release page
SHA256 Checksum
544eb221c2a5ec84467c1eb92851d98348c5e8eec9bf0346bd942e302faad73bThis checksum is extracted from the GitHub Release page. Verify file integrity after download.
All SHA-256 checksums on this platform are extracted from the project's official GitHub Release page, without any modification. You can independently verify them on the GitHub Releases page.
Open Source Transparency
View GitHub SourceUninstall Info
macOS: Move LocalAI.app to Trash and empty it. Docker: docker stop local-ai; docker rm local-ai; docker rmi localai/localai:latest and related tags.
No Extra Dependencies
Ready to use after download. No additional runtime required.
Having issues? Check the FAQ below
2 FAQs
Similar Projects
daily_stock_analysis
An open-source AI stock analysis system for A/H/US markets that generates daily decision dashboards and pushes them to WeChat Work, Feishu, Telegram, Discord, Slack, or email. Deploy via GitHub Actions for free.
ollama
Ollama lets you download, run, and manage large language models locally. One command, multiple platforms, endless possibilities.
llama.cpp
High-performance LLM inference engine in C/C++ with minimal dependencies, supporting quantized models (1.5–8 bit) and diverse hardware (Apple Silicon, CUDA, Vulkan, etc.).