cli99/llm-analysis
A Python tool that estimates training and inference latency and memory consumption for transformer-based language models given GPU, data type, and parallelism configurations.

The project automates performance estimation for large language models by computing FLOPs, memory usage, and latency based on model, hardware, data type, and parallelism settings. It provides both a Python API (LLMAnalysis class) and command-line entry points for quick calculations. The tool covers parallelism schemes, activation recomputation, data types, and inference assumptions to help ML engineers plan resource requirements for LLM training and serving.