Is GigaAM open source?

Yes — salute-developers/GigaAM is open source, released under the MIT license.

What language is GigaAM written in?

salute-developers/GigaAM is primarily written in Python.

How popular is GigaAM?

salute-developers/GigaAM has 700 stars on GitHub.

Where can I find GigaAM?

salute-developers/GigaAM is on GitHub at https://github.com/salute-developers/GigaAM.

← all repositories

salute-developers/GigaAM

Russian speech recognition that claims to out-whisper Whisper

GigaAM exists because Russian call centers, music, and atypical speech deserve a dedicated open-source foundation model instead of hand-me-down multilingual checkpoints.

★700 stars Python Image · Video · Audio Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does GigaAM is a family of Conformer-based acoustic models (220–240M parameters) pre-trained on Russian speech data. The latest version, GigaAM-v3, was trained on 700,000 hours of audio and fine-tuned for ASR using CTC and RNN-T decoders, plus an emotion recognition head. It handles short clips natively and long-form audio via external voice-activity detection, with end-to-end variants that add punctuation and text normalization.

The interesting bit The project is unusually specific about its turf: it targets Russian call centers, music, atypical speech, and voice messages, claiming a 30% relative WER improvement on those domains and a 70:30 side-by-side win against Whisper-large-v3 judged by an independent LLM. That specificity is rare in a field that usually smooths everything into a single multilingual leaderboard.

Key highlights

v3 scales pre-training to 700,000 hours of Russian speech and adds end-to-end CTC/RNN-T models with punctuation and normalization.
Claims a 70:30 average win over Whisper-large-v3 in LLM-as-a-Judge side-by-side comparisons for end-to-end ASR.
Includes an emotion recognition model (GigaAM-Emo) that claims a 15% Macro F1 improvement over existing models.
Ships with ONNX export, TensorRT, and Triton Inference Server support for production deployment.
Runs fully offline for standard inference; long-form mode requires a Hugging Face token for pyannote.audio voice segmentation.

Caveats

Native transcription is capped at 25 seconds; long-form audio requires installing extra dependencies and obtaining a Hugging Face token for pyannote.audio segmentation.
The 30% domain improvement comes from new internal datasets, so your mileage will vary unless your audio matches those specific domains.
Emotion recognition and long-form ASR rely on separate model loads or external tools rather than a single monolithic checkpoint.

Verdict Worth a look if you are building Russian voice products and need an open, production-ready ASR pipeline with emotion recognition in the same repository. Skip it if your workloads are primarily non-Russian or if you need a single model that handles arbitrary-length audio out of the box.

Frequently asked

What is salute-developers/GigaAM?: GigaAM exists because Russian call centers, music, and atypical speech deserve a dedicated open-source foundation model instead of hand-me-down multilingual checkpoints.
Is GigaAM open source?: Yes — salute-developers/GigaAM is open source, released under the MIT license.
What language is GigaAM written in?: salute-developers/GigaAM is primarily written in Python.
How popular is GigaAM?: salute-developers/GigaAM has 700 stars on GitHub.
Where can I find GigaAM?: salute-developers/GigaAM is on GitHub at https://github.com/salute-developers/GigaAM.