Om-Alve/smolGPT
A minimal PyTorch implementation for training a small GPT language model from scratch with modern architecture features.

This repository provides a from-scratch implementation of a GPT model in pure PyTorch with no abstraction overhead. It includes modern architectural components such as flash attention, RMSNorm, SwiGLU activation, and optional rotary embeddings (RoPE). The project supports the full training pipeline including mixed precision training, gradient accumulation, warmup scheduling, and a built-in TinyStories dataset processor with SentencePiece tokenizer integration.