OpenSource-Hub
O

omlx

SHA-256
13.8k stars·AI Productivity·SHA-256 checksum verified

Local LLM inference server optimized for Mac with continuous batching and tiered hotspot/cold KV cache. Manage everything from the menu bar.

Run local LLMs on Mac effortlessly from the menu bar, with smart caching and multi-model support.

Core Features

  • Full control from menu bar: load/unload/pin models
  • Hot/cold tiered KV cache: hot blocks in RAM, cold on SSD, survives restarts
  • Continuous batching for concurrent requests without blocking
  • Serve multiple model types together: text, vision, embeddings, rerankers
  • Per-model settings: alias, TTL, pinning, overrides

What It Can't Do

  • Only works on Apple Silicon Macs (M1–M4). Requires macOS 15.0+ (Sequoia). Does not support NVIDIA GPUs or Intel Macs. First run requires downloading models, ensure sufficient disk space.

Use Cases

  • Run open-source LLMs locally on a MacBook for privacy
  • Boost coding workflows with Claude Code, Cursor, or Copilot
  • Offline AI inference with low latency requirements

oMLX is a locally-run LLM serving engine built specifically for Apple Silicon Macs. It lets you load and manage multiple AI models (text, vision, embedding) through a clean menu bar interface or web dashboard. Key innovations include a hot/cold tiered KV cache that stores frequently used context in RAM and offloads less active data to SSD — even surviving server restarts. Combined with continuous batching, it efficiently handles concurrent requests. The app auto-detects models from a folder, supports per-model settings (TTL, pinning, alias), and adapts to tools like Claude Code. No cloud dependency, full offline capability.

Tags

LLMMacApple Silicon推理引擎KV缓存本地AI菜单栏管理多模型

Getting Started

1

Download installer

Click the button above to download the installer for your system

2

Install the software

Open the downloaded dmg file, then drag the app to Applications

3

Download the .dmg from Releases and drag it to Applications

4

Launch oMLX, follow the Welcome screen to set up model directory and start server

5

Download your first model and start chatting (default: localhost:8000)

Install Guide
  1. Download the .dmg from Releases and drag it to Applications
  2. Launch oMLX, follow the Welcome screen to set up model directory and start server
  3. Download your first model and start chatting (default: localhost:8000)
File Integrity

SHA-256 checksum verified

Checksum extracted from GitHub official Release page

SHA256 Checksum

803d999247af13bc778ce623db6ef539266a82e35ccd984a80a40b0dc2a45114

This checksum is extracted from the GitHub Release page. Verify file integrity after download.

All SHA-256 checksums on this platform are extracted from the project's official GitHub Release page, without any modification. You can independently verify them on the GitHub Releases page.

Open Source Transparency

View GitHub Source
Environment Guide

Uninstall Info

Delete oMLX.app from Applications, then remove ~/.omlx folder to erase models and settings.

No Extra Dependencies

Ready to use after download. No additional runtime required.

Project Info
LicenseApache 2.0
Last Updated2026-06-26 04:45:32
GitHub RepositoryOfficial Website

Having issues? Check the FAQ below

4 FAQs

Similar Projects