omlx
SHA-256Local LLM inference server optimized for Mac with continuous batching and tiered hotspot/cold KV cache. Manage everything from the menu bar.
Smart Download
Download Download Version
v0.3.8 · 616.1 MB
Run local LLMs on Mac effortlessly from the menu bar, with smart caching and multi-model support.
Core Features
- Full control from menu bar: load/unload/pin models
- Hot/cold tiered KV cache: hot blocks in RAM, cold on SSD, survives restarts
- Continuous batching for concurrent requests without blocking
- Serve multiple model types together: text, vision, embeddings, rerankers
- Per-model settings: alias, TTL, pinning, overrides
What It Can't Do
- •Only works on Apple Silicon Macs (M1–M4). Requires macOS 15.0+ (Sequoia). Does not support NVIDIA GPUs or Intel Macs. First run requires downloading models, ensure sufficient disk space.
Use Cases
- Run open-source LLMs locally on a MacBook for privacy
- Boost coding workflows with Claude Code, Cursor, or Copilot
- Offline AI inference with low latency requirements
oMLX is a locally-run LLM serving engine built specifically for Apple Silicon Macs. It lets you load and manage multiple AI models (text, vision, embedding) through a clean menu bar interface or web dashboard. Key innovations include a hot/cold tiered KV cache that stores frequently used context in RAM and offloads less active data to SSD — even surviving server restarts. Combined with continuous batching, it efficiently handles concurrent requests. The app auto-detects models from a folder, supports per-model settings (TTL, pinning, alias), and adapts to tools like Claude Code. No cloud dependency, full offline capability.
Tags
Getting Started
Install the software
Open the downloaded dmg file, then drag the app to Applications
Download the .dmg from Releases and drag it to Applications
Launch oMLX, follow the Welcome screen to set up model directory and start server
Download your first model and start chatting (default: localhost:8000)
- Download the .dmg from Releases and drag it to Applications
- Launch oMLX, follow the Welcome screen to set up model directory and start server
- Download your first model and start chatting (default: localhost:8000)
SHA-256 checksum verified
Checksum extracted from GitHub official Release page
SHA256 Checksum
803d999247af13bc778ce623db6ef539266a82e35ccd984a80a40b0dc2a45114This checksum is extracted from the GitHub Release page. Verify file integrity after download.
All SHA-256 checksums on this platform are extracted from the project's official GitHub Release page, without any modification. You can independently verify them on the GitHub Releases page.
Open Source Transparency
View GitHub SourceUninstall Info
Delete oMLX.app from Applications, then remove ~/.omlx folder to erase models and settings.
No Extra Dependencies
Ready to use after download. No additional runtime required.
Having issues? Check the FAQ below
4 FAQs
Similar Projects
LocalAI
LocalAI is the open-source AI engine to run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required. Drop-in API compatibility with OpenAI, Anthropic, and ElevenLabs.
daily_stock_analysis
An open-source AI stock analysis system for A/H/US markets that generates daily decision dashboards and pushes them to WeChat Work, Feishu, Telegram, Discord, Slack, or email. Deploy via GitHub Actions for free.
ollama
Ollama lets you download, run, and manage large language models locally. One command, multiple platforms, endless possibilities.