Is ai-performance-engineering open source?

Yes — cfregly/ai-performance-engineering is open source, released under the Apache-2.0 license.

What language is ai-performance-engineering written in?

cfregly/ai-performance-engineering is primarily written in Python.

How popular is ai-performance-engineering?

cfregly/ai-performance-engineering has 1.6k stars on GitHub.

Where can I find ai-performance-engineering?

cfregly/ai-performance-engineering is on GitHub at https://github.com/cfregly/ai-performance-engineering.

← all repositories

cfregly/ai-performance-engineering

A field manual for turning GPU cycles into actual goodput

To house the code, checklists, and profiling recipes for an O’Reilly guide that treats AI performance as a full-stack systems problem rather than a benchmarking contest.

★1.6k stars Python Inference · Serving ML Frameworks LLMOps · Eval

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is the companion repository for Chris Fregly’s O’Reilly book AI Systems Performance Engineering. It gathers code samples, tooling references, and a 200-plus item performance checklist that spans the full stack—from CUDA kernels and PyTorch compilers to Kubernetes NUMA pinning and inference serving. The focus is modern NVIDIA hardware, with an empirical, profile-first approach to training and inference bottlenecks.

The interesting bit

The book ignores peak-FLOP bragging rights in favor of “goodput” and stall-point diagnosis. It treats GPU occupancy, memory coalescing, and disaggregated prefill/decode routing as systems engineering problems, not black-magic incantations. There is also a refreshing obsession with the unglamorous parts: NCCL topology awareness, power management, and checklists that prevent teams from re-introducing regressions.

Key highlights

Twenty chapters covering hardware (Grace/Blackwell), OS and container tuning, NCCL networking, GPUDirect storage, CUDA kernel optimization, and PyTorch distributed profiling.
Inference coverage includes vLLM, SGLang, TensorRT-LLM, and NVIDIA Dynamo, with detailed sections on KV-cache movement and speculative decoding.
A 200-item performance checklist covering reproducibility, driver tuning, memory layouts, and thermal management.
Authored by a performance engineer with Netflix, Databricks, and AWS experience.
Supports an active meetup series and YouTube channel with recent sessions on RL-based kernel tuning and low-precision numerics.

Caveats

The README functions as a book prospectus and event hub; the actual code lives in a code/ directory that isn’t rendered inline, so the immediate runnable depth is unclear.
Content is tightly coupled to the O’Reilly book release, meaning the repo is best treated as a curated companion rather than a standalone open-source framework.

Verdict

Grab it if you need to justify cluster spend or debug why your “100% GPU utilization” still yields terrible throughput. If you want a single pip-installable optimizer, look elsewhere—this is a curriculum, not a library.

Frequently asked

What is cfregly/ai-performance-engineering?: To house the code, checklists, and profiling recipes for an O’Reilly guide that treats AI performance as a full-stack systems problem rather than a benchmarking contest.
Is ai-performance-engineering open source?: Yes — cfregly/ai-performance-engineering is open source, released under the Apache-2.0 license.
What language is ai-performance-engineering written in?: cfregly/ai-performance-engineering is primarily written in Python.
How popular is ai-performance-engineering?: cfregly/ai-performance-engineering has 1.6k stars on GitHub.
Where can I find ai-performance-engineering?: cfregly/ai-performance-engineering is on GitHub at https://github.com/cfregly/ai-performance-engineering.