Is semtools open source?

Yes — run-llama/semtools is open source, released under the MIT license.

What language is semtools written in?

run-llama/semtools is primarily written in Rust.

How popular is semtools?

run-llama/semtools has 1.8k stars on GitHub.

Where can I find semtools?

run-llama/semtools is on GitHub at https://github.com/run-llama/semtools.

← all repositories

run-llama/semtools

grep's overachieving cousin now speaks embeddings

A Rust CLI that pipes PDFs and DOCX files into semantic search without leaving your terminal.

★1.8k stars Rust RAG · Search Agents Data Tooling

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

SemTools is a Rust CLI that parses documents (PDF, DOCX, PPTX) into markdown via LlamaParse, then runs local semantic search over them using static multilingual embeddings. It also bundles an AI agent (ask) that can search and read documents to answer questions, plus workspace management for caching embeddings across large collections.

The interesting bit

The design is aggressively Unix-native: everything speaks stdin/stdout, so you can chain semtools parse | xargs search | grep like any other shell filter. The search itself runs locally with model2vec embeddings and cosine similarity — no API calls, no network latency for the retrieval step. The workspace subcommand adds an IVF_PQ index that auto-updates when files change, which is the boring-but-valuable part that makes repeated searches on big corpora not painful.

Key highlights

Local semantic search with per-line context matching and configurable distance thresholds
Document parsing through LlamaParse API (cloud-backed, with caching and concurrent requests)
ask subcommand runs an agent loop with search/read tools, defaulting to OpenAI but accepting any OpenAI-compatible API
Workspace mode caches embeddings in ~/.semtools/workspaces/ with automatic re-embedding on file changes
Installs via npm or cargo; npm falls back to local Rust build if no prebuilt binary exists

Caveats

parse requires a LlamaParse API key (free tier available); ask requires an OpenAI key — only search and workspace are fully local
The README notes “more parsing backends (something local-only would be great!)” as explicit future work, so offline parsing isn’t here yet
Default embedding model is 128M parameters — fast, but not the most nuanced for specialized domains

Verdict

Worth a look if you live in the terminal and want semantic search without spinning up a vector database. Skip it if you need fully offline document parsing or heavy-duty embedding models; the cloud dependencies are real.

Frequently asked

What is run-llama/semtools?: A Rust CLI that pipes PDFs and DOCX files into semantic search without leaving your terminal.
Is semtools open source?: Yes — run-llama/semtools is open source, released under the MIT license.
What language is semtools written in?: run-llama/semtools is primarily written in Rust.
How popular is semtools?: run-llama/semtools has 1.8k stars on GitHub.
Where can I find semtools?: run-llama/semtools is on GitHub at https://github.com/run-llama/semtools.