A self-hosted AI workspace that bolts chat, agents, email triage, calendars, and deep research onto your own hardware.
Inference · Serving
newcomers · velocity + momentumA deliberately narrow inference engine that treats your SSD as first-class KV cache real estate.
Karpathy's minimal LLM training harness turns a $43K 2019 training run into a sub-$100 afternoon project.
A local proxy that turns sixteen scattered LLM free tiers into one OpenAI-compatible endpoint with automatic failover.
Because Google's free web chat doesn't have an official API, so someone built the unofficial one by reverse-engineering its private protocol.
The pi project bundles a CLI coding agent with an unusually paranoid approach to supply-chain security.
A massive Mixture-of-Experts model that trains cheap and runs lean by keeping most of its weights asleep.
OpenFang ships autonomous "Hands"—pre-built agents that research, monitor, and publish on schedules, not chat prompts.
An open-source gateway for splitting AI subscriptions across teams without breaking native tools.
oMLX brings vLLM-style continuous batching and tiered KV caching to Apple Silicon, controlled from a native Swift menubar app.
Ollama wraps llama.cpp in a one-line installer and a model registry so you can run open weights without reading a dozen READMEs.
OpenSquilla routes each turn to the cheapest capable LLM, keeping persistent memory and tool use identical across CLI, Web UI, and chat channels.
A Rust-based sandbox service that swaps container speed for real kernel isolation while keeping the same Python SDK.
Local proxy that auto-falls back to free models when your paid quota dies mid-session.
A Rust vector index that squeezes 31 GB of float32 embeddings into 4 GB without a training phase, then outruns FAISS on the query.
A Gradio-based web UI that crams every community trick for image generation into one browser tab.
A Go proxy that exposes Gemini CLI, Claude Code, Codex, and Grok through standard OpenAI-compatible APIs—no API keys required, just your existing OAuth logins.
VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.
An LLM-centric vision encoder that squeezes documents into surprisingly few tokens, then lets the language model do the actual reading.
A self-hosted proxy that turns ChatGPT's web-only image generation into an OpenAI-compatible API with account rotation and web UI.
