A PyTorch implementation of "Attention Is All You Need" that scales from 13M to multi-billion parameter models.
Language Models
heavyweights · gaining speedAirLLM slices giant transformers into layer shards so they fit in consumer VRAM without quantization or distillation.
Twenty-five bite-sized projects showing how to wire up LLMs, RAG, and agents into things that actually do work.
It takes a village of agents to buy a stock—analysts, debaters, risk managers, and a portfolio manager who actually says no.
A dependency-free C/C++ inference engine that squeezes large language models onto laptops, phones, and browsers through aggressive quantization and hand-rolled kernels.
Eagle is less a single model than NVIDIA's internal R&D pipeline for multimodal AI, now open-sourced with three generations of VLMs and a grounding specialist.
OpenAI's Whisper replaces the usual Rube Goldberg pipeline of speech-processing tools with a single Transformer trained to do it all.
LiteLLM is the adapter layer that stops your codebase from fracturing across a dozen provider SDKs.
AgentScope 2.0 bets that modern LLMs need less hand-holding, not more orchestration.
TileRT squeezes millisecond-level latency out of hundred-billion-parameter models by decomposing operators into tile-level tasks and overlapping compute, I/O, and communication across 8 GPUs.
An AI companion platform that remembers, feels, and stares at your screen—now with a Steam release and a 1000-year SSL certificate.
Read Frog overlays AI translations, explanations, and text-to-speech onto any webpage so you can learn while you browse.
A notebook-based workout plan for PyTorch fluency, from linear regression up to building LLM components from scratch.
AgentScope Java wraps ReAct agents in the kind of runtime controls, sandboxes, and observability that enterprise deployments actually need.
Curated tutorials, tool reviews, and monetization playbooks for coding with AI—written by one prolific developer and open to all.
A purpose-built inference and fine-tuning stack that treats M-series chips as first-class citizens instead of afterthoughts.
A living literature review that tracks whether researchers are using language models to attack, defend, or just benchmark each other.
Complete, reproducible pipelines from raw data to deployment-ready Nemotron models, with a modular CLI that lets you remix stages like LEGO bricks.
A C++ inference engine built to run Gemma, Llama, and friends on everything from Raspberry Pi to Pixel Watch—because the cloud is sometimes just too far away.
Someone finally collected all those "top projects" Medium posts into one giant table.



