kyegomez/Gemini
Open-source PyTorch implementation of Google's multi-modal foundation model Gemini supporting text, image, audio, and video inputs.

The repository implements Google’s Gemini model as an open-source project. It uses a transformer architecture that processes multiple modalities directly through special decoders for text or image generation. The model accepts text, audio, images, and video as input tokens processed by a transformer with conditional decoding for generation. Key features include Multi Grouped Query Attention, Flash Attention, RoPE, ALiBi, and KV cache support.