A self-hosted AI workspace that bolts chat, agents, email triage, calendars, and deep research onto your own hardware.
Inference · Serving
newcomers · velocity + momentumA Rust vector index that squeezes 31 GB of float32 embeddings into 4 GB without a training phase, then outruns FAISS on the query.
VoxCPM2 generates speech directly from text using continuous diffusion, no discrete audio tokens required.
A local proxy that turns sixteen scattered LLM free tiers into one OpenAI-compatible endpoint with automatic failover.
whichllm ranks local models by real benchmark scores, not parameter count, and tells you which ones actually fit your hardware.
The pi project bundles a CLI coding agent with an unusually paranoid approach to supply-chain security.
An open-source gateway for splitting AI subscriptions across teams without breaking native tools.
OpenSquilla routes each turn to the cheapest capable LLM, keeping persistent memory and tool use identical across CLI, Web UI, and chat channels.
A dependency-free C/C++ inference engine that squeezes large language models onto laptops, phones, and browsers through aggressive quantization and hand-rolled kernels.
A Go-based gateway that cross-converts between OpenAI, Claude, and Gemini formats so you don't have to pick sides in the API format wars.
Cosmos 3 tries to unify video generation, robot action prediction, and physical reasoning inside a single 16B–64B Mixture-of-Transformers architecture.
OpenMed packages clinical entity extraction and HIPAA-grade de-identification into models small enough for Apple Silicon and impatient DevOps teams.
Local proxy that auto-falls back to free models when your paid quota dies mid-session.
A Go proxy that exposes Gemini CLI, Claude Code, Codex, and Grok through standard OpenAI-compatible APIs—no API keys required, just your existing OAuth logins.
AirLLM slices giant transformers into layer shards so they fit in consumer VRAM without quantization or distillation.
A visual programming interface for image, video, 3D, and audio generation that treats model pipelines as composable graphs.
Self-hosted chat UI that unifies OpenAI, Anthropic, Google, AWS, and two dozen other providers under one roof.
LiteLLM is the adapter layer that stops your codebase from fracturing across a dozen provider SDKs.
A Chinese speech toolkit that bundles ASR, diarization, emotion detection, and streaming into one MIT-licensed package.
Ollama wraps llama.cpp in a one-line installer and a model registry so you can run open weights without reading a dozen READMEs.



