OpenSource-Hub

turbovec

ライブラリ

RyanCodrai/turbovec

TurboQuant に基づく効率的なベクトルインデックス、Rust コアと Python バインディング

概要

Google TurboQuantアルゴリズムに基づき、高圧縮と高速検索を実現し、フィルタークエリと安定したIDをサポート。多くのベンチマークでFAISSを上回ります。Rustで記述され、Pythonバインディングを提供し、容易に統合できます。

README プレビュー

\n  \n\n\n\n  \n  \n  \n  \n\n\n---\n\n**A 10 million document corpus takes 31 GB of RAM as float32. turbovec fits it in 4 GB - and searches it faster than FAISS.**\n\nturbovec is a Rust vector index with Python bindings, built on Google Research's [**TurboQuant**](https://arxiv.org/abs/2504.19874) algorithm — a data-oblivious quantizer that matches the Shannon lower bound on distortion, with no codebook training and no separate train phase.\n\n- **Online ingest.** Add vectors, they're indexed — no train step, no parameter tuning, no rebuilds as the corpus grows.\n- **Faster than FAISS.** Hand-written NEON (ARM) and AVX-512BW (x86) kernels beat FAISS IndexPQFastScan by 12–20% on ARM and match-or-beat it on x86.\n- **Filter at search time.** Pass an id allowlist (or a slot bitmask) to `search()` and the kernel honours it directly. You always get up to `k` results from the allowed set — no over-fetching, no recall hit on selective filters.\n- **Pure local.** No managed service, no data leaving your machine or VPC. Pair with any open-source embedding model for a fully air-gapped RAG stack.\n\nBuilding RAG where privacy, memory, or latency matters? **You're in the right place.**\n\n## Python\n\n```bash\npip install turbovec\n```\n\n```python\nfrom turbovec import TurboQuantIndex\n\nindex = TurboQuantIndex(dim=1536, bit_width=4)\nindex.add(vectors)\nindex.add(more_vectors)\n\nscores, indices = index.search(query, k=10)\n\nindex.write("my_index.tq")\nloaded = TurboQuantIndex.load("my_index.tq")\n```\n\nNeed stable ids that survive deletes? Use `IdMapIndex`:\n\n```python\nimport numpy as np\nfrom turbovec import IdMapIndex\n\nindex = IdMapIndex(dim=1536, bit_width=4)\nindex.add_with_ids(vectors, np.array([1001, 1002, 1003], dtype=np.uint64))\n\nscores, ids = index.search(query, k=10)   # ids are your uint64 external ids\nindex.remove(1002)                         # O(1) by id\n\nindex.write("my_index.tvim")\nloaded = IdMapIndex.load("my_index.tvim")\n```\n\n### Hybrid retr