A deliberately narrow inference engine that treats your SSD as first-class KV cache real estate.
Language Models
heavyweights · velocity + momentumAn open-source guess at how Anthropic might be doing silent, looped reasoning inside a single forward pass.
ZeroClaw is an autonomous agent runtime you host yourself, with pluggable LLMs, 30+ channels, and enough security knobs to make a paranoid sysadmin nod approvingly.
CodeWhale wraps DeepSeek V4 in a formal hierarchy of rules so the model knows which instruction wins when everything conflicts.
Karpathy's minimal LLM training harness turns a $43K 2019 training run into a sub-$100 afternoon project.
A desktop app that turns Andrej Karpathy's LLM wiki pattern into a persistent, self-organizing knowledge base with graph analysis and a two-step ingest pipeline.
A massive Mixture-of-Experts model that trains cheap and runs lean by keeping most of its weights asleep.
DeepSeek-R1 proves you can teach LLMs to reason without spoon-feeding them curated examples first.
It takes a village of agents to buy a stock—analysts, debaters, risk managers, and a portfolio manager who actually says no.
Ollama wraps llama.cpp in a one-line installer and a model registry so you can run open weights without reading a dozen READMEs.
LangExtract turns wall-of-text documents into structured, verifiable data by making the LLM show its work.
Heretic automatically strips safety alignment from transformer models without retraining, using optimization to find the least destructive way to make them stop refusing.
A dependency-free C/C++ inference engine that squeezes large language models onto laptops, phones, and browsers through aggressive quantization and hand-rolled kernels.
A step-by-step PyTorch walkthrough that trains a small-but-real LLM on ordinary laptops, no external libraries allowed.
A self-hosted proxy that turns ChatGPT's web-only image generation into an OpenAI-compatible API with account rotation and web UI.
An autoregressive foundation model that quantizes market data into discrete tokens and predicts the next "words" in a financial time series.
apfel exposes Apple’s on-device FoundationModels as a UNIX CLI and OpenAI-compatible server—no API keys, no cloud, no downloads.
A research framework that uses a multimodal LLM to plan video edits semantically, then hands off to a diffusion transformer to actually draw the frames.
Every line of a modern GPT, annotated like you're five, engineered like you're not.
MiniMind strips away framework magic so you actually see how transformers work, end to end.


