← all repositories
myscale/MyScaleDB

ClickHouse with a vector-search engine grafted on

A fork of ClickHouse that lets you run vector search, full-text search, and SQL analytics in the same query without learning new APIs.

MyScaleDB
Velocity · 7d
+1.3
★ / day
Trend
steady
star history

What it does MyScaleDB is a fork of ClickHouse that adds vector indexes and full-text search to an already-fast OLAP columnar database. You create tables with Array(Float32) columns for embeddings, build a SCANN vector index via ALTER TABLE, then query with ordinary SQL—filtered vector search, joins, and hybrid text/vector search all in one query.

The interesting bit The project bets that pre-filtering on structured metadata before vector search is where accuracy and speed live or die, and ClickHouse’s columnar storage happens to be very good at that. Rather than bolt vectors onto a row-store transactional database, they started with an analytical engine built for aggressive data skipping and SIMD scans.

Key highlights

  • Fully SQL interface: no custom SDK to learn; clickhouse-client works out of the box
  • Supports filtered search, SQL-vector joins, and hybrid text+vector search
  • Self-host via Docker or build from source on Ubuntu 22.04 with LLVM 15
  • Claims millisecond latency on billion-scale vectors (cloud tier)
  • Several ClickHouse upstream improvements have been contributed back

Caveats

  • The open-source build uses the SCANN index; the MSTG disk-based algorithm for billion-scale data is cloud-only
  • Default Docker config locks access to localhost unless you override network settings or compose your own users.d XML
  • Build environment is pinned to Ubuntu 22.04 and LLVM 15—no promises about other toolchains

Verdict Worth a look if you’re already in the ClickHouse ecosystem and tired of maintaining a separate vector database. Skip it if you need a managed service with zero ops overhead; the self-hosted path is very much DIY.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.