Language Models

underdogs · picking up speed

PrismML-Eng/Bonsai-demo

+58% /wk +169 ★/day↗accelerating

A demo repo for running extreme-quantized language models locally without needing a research cluster.

★ 2.1k Shell Inference · Serving · explained

openJiuwen-ai/jiuwenswarm

+32% /wk +80 ★/day↗accelerating

It breaks complex tasks across a team of specialized LLM agents that refine their own skills as they work.

★ 1.8k Python Agents · explained

ximeiorg/Xime

+13% /wk +13 ★/day↗accelerating

Xime is a deliberately minimal, Rime-based Android input method that serves as its author's personal testbed for on-device AI experiments in predictive text and speech recognition.

★ 702 Kotlin Language Models · explained

AtomicBot-ai/Atomic-Chat

+9.2% /wk +15 ★/day↗accelerating

It turns your local machine into an OpenAI-compatible inference endpoint so agents and IDEs can run on offline models without reconfiguration.

★ 1.2k TypeScript Inference · Serving · explained

MarkPDFdown/markpdfdown

+8.4% /wk +23 ★/day↗accelerating

Uses multimodal LLMs to transcribe PDFs into Markdown, preserving complex layouts that traditional extractors mangle.

★ 1.9k Python Data Tooling · explained

ForceInjection/AI-fundamentals

+8.2% /wk +23 ★/day↗accelerating

Curated technical deep-dives covering everything from NVLink signal integrity to Kubernetes GPU scheduling and Huawei NPU porting.

★ 2k HTML Learning · explained

verl-project/verl-omni

+10% /wk +9.6 ★/day↗accelerating

It split off from `verl` to give diffusion, video, and omni-modality models an RL post-training framework that doesn't treat them like chatbots.

★ 661 Python ML Frameworks · explained

Bytez-com/docs

+5.6% /wk +18 ★/day↗accelerating

Bytez wraps 175,000+ AI models behind a single endpoint so you don't have to host them yourself.

★ 2.3k TypeScript Inference · Serving · explained

sapientinc/HRM-Text

+5.3% /wk +13 ★/day↗accelerating

HRM-Text claims to cut pretraining costs by 130–600× compute and 150–900× data, shipping a full 1B-parameter framework with FSDP2, FlashAttention 3, and a hierarchical recurrent architecture.

★ 1.7k Python Language Models · explained

kyegomez/OpenMythos

+5.2% /wk +109 ★/day↗accelerating

OpenMythos is an independent attempt to reconstruct Anthropic’s rumored Claude Mythos architecture as a trainable Recurrent-Depth Transformer with switchable attention and sparse MoE layers.

★ 14.8k Python Language Models · explained

huggingface/speech-to-speech

+4.9% /wk +45 ★/day↗accelerating

A modular speech-to-speech pipeline that exposes an OpenAI Realtime-compatible WebSocket API so you can run voice agents on local or open-source models instead of proprietary cloud services.

★ 6.5k Python Agents · explained

Arthur-Ficial/apfel

+3.5% /wk +31 ★/day↗accelerating

Apfel surfaces Apple’s built-in FoundationModels as a pipe-friendly UNIX tool and OpenAI-compatible local server, no API keys required.

★ 6.2k Swift Inference · Serving · explained

voocel/ainovel-cli

+7.6% /wk +17 ★/day↗accelerating

This Go CLI turns a single sentence into a full novel by making Architect, Writer, and Editor LLM agents plan, draft, and review inside a long-loop state machine—no human hand-holding required.

★ 1.5k Go Agents · explained

andrewyng/aisuite

+2.9% /wk +64 ★/day↗accelerating

A thin Python wrapper that lets you swap GPT-4o for Claude or Gemini without touching your code.

★ 15.4k Python Language Models · explained

fla-org/flash-linear-attention

+2.6% /wk +20 ★/day↗accelerating

It corrals the latest subquadratic sequence-model research into hardware-efficient, training-ready PyTorch layers verified across NVIDIA, AMD, and Intel GPUs.

★ 5.4k Python ML Frameworks · explained

OpenDCAI/DataFlex

+9.4% /wk +23 ★/day↗accelerating

DataFlex stops LLM training loops from wasting compute on static data mixes by dynamically selecting, mixing, and reweighting samples inside LLaMA-Factory.

★ 1.7k Python Language Models · explained

qualcomm/GenieX

+1.8% /wk +22 ★/day↗accelerating

NexaSDK is a local inference engine that squeezes frontier LLMs and vision models onto Qualcomm silicon through NPU, GPU, and CPU backends.

★ 8.3k Rust Inference · Serving · explained

bigscience-workshop/petals

+1.8% /wk +27 ★/day↗accelerating

Petals lets you run and fine-tune models like Llama 3.1 405B from a desktop by distributing layers across a public swarm of consumer GPUs.

★ 10.4k Python Inference · Serving · explained

FareedKhan-dev/train-llm-from-scratch

+3.1% /wk +39 ★/day↗accelerating

A PyTorch implementation of "Attention Is All You Need" that scales from 13M to multi-billion parameter models.

★ 8.7k Python Language Models · explained

jingyaogong/minimind-o

+3.5% /wk +11 ★/day↗accelerating

MiniMind-O packs listen-see-speak intelligence into a 0.1B-parameter model you can retrain from the first line of code on a single desktop GPU.

★ 2.2k Python Language Models · explained

loading more…