Rust framework runs LLM tools at the edge, not in Virginia
YoMo wants your AI agent's weather-checking code closer to the user than the data center.

What it does
YoMo is a Rust-based server for hosting LLM function-calling tools. You write a TypeScript handler (say, fetching weather), run yomo run, and the framework registers it as an OpenAI-compatible tool that any LLM can invoke. The server handles routing, TLS 1.3 on every packet, and can proxy to local or remote models via a YAML config.
The interesting bit The pitch is geo-distribution: running inference and tools near users instead of centralizing in one region. The README shows a diagram of edge nodes talking to each other, though the actual mechanics of multi-node deployment and synchronization are left as hand-waving. The underlying transport is QUIC, which explains the low-latency ambition.
Key highlights
- OpenAI-compatible chat completions endpoint (
/v1/chat/completions) with streaming support - TypeScript-first tool definitions with exported
description,Argumenttypes, andhandler - TLS 1.3 by default; Bearer token auth for HTTP APIs
- Local model support via Ollama (example shows
qwen3.5andgemma-4-31B-it) - Single-binary Rust CLI:
yomo serve,yomo run,yomo init
Caveats
- The “serverless” claim is aspirational: you run the binary yourself; there’s no hosted platform evident in the README
- Geo-distributed architecture is described conceptually, but no docs explain how to actually deploy a mesh of nodes
- Development section has typos (“Devleopment”) and the getting-started example uses a non-existent model (
gemma-4-31B-it)
Verdict Worth a look if you’re self-hosting agent tools and want a lightweight Rust alternative to heavier frameworks. Skip it if you need managed multi-region deployment or mature documentation today.