Semantic search that never phones home
A Swift package for running text embeddings and vector search entirely on Apple devices, because not every document belongs on someone else's server.

What it does
SimilaritySearchKit lets you embed text and search by meaning on iOS and macOS without network calls. You initialize a SimilarityIndex with an embedding model and distance metric, feed it strings, then query for semantically similar results. It handles the model inference, vector storage, and similarity scoring locally using CoreML.
The interesting bit
The library ships with pre-converted CoreML versions of HuggingFace models (Distilbert, MiniLM variants) plus Apple’s built-in NaturalLanguage embedding, and the whole pipeline—embeddings, metrics, text splitting, tokenization, even vector storage—is protocol-driven so you can swap in custom implementations without touching the core search logic.
Key highlights
- Four built-in embedding models ranging from 46 MB to 86 MB, including quantized options for Q&A and general similarity
- Three distance metrics: dot product, cosine similarity, Euclidean distance
- Disk-backed indexing for datasets too large for memory, with JSON-based storage swappable via
VectorStoreProtocol - Example projects covering basic search, PDF semantic search, and a full “chat with your files” macOS app
- Bring-your-own-model support through
EmbeddingsProtocolandDistanceMetricProtocol
Caveats
- Requires iOS 16.0+ or macOS 13.0+ for the examples; exact base requirements for the package itself aren’t specified
- Future work is explicitly listed as incomplete: no HSNW/Annoy approximate indexing yet, no query filters by metadata, no Metal acceleration for distance calculations
- The README notes “all around performance improvements” are still pending
Verdict
Worth a look if you’re building privacy-sensitive or offline-first NLP features in Swift and want to avoid the complexity of self-hosting embedding services. Less compelling if you need production-grade approximate nearest-neighbor search at massive scale today—this is still brute-force or basic disk-backed indexing.