Is lhotse open source?

Yes — lhotse-speech/lhotse is open source, released under the Apache-2.0 license.

What language is lhotse written in?

lhotse-speech/lhotse is primarily written in Python.

How popular is lhotse?

lhotse-speech/lhotse has 1.1k stars on GitHub.

Where can I find lhotse?

lhotse-speech/lhotse is on GitHub at https://github.com/lhotse-speech/lhotse.

← all repositories

lhotse-speech/lhotse

Kaldi's Python cousin wrangles speech, video, and text

A data-prep library that treats audio snippets like editable clips, now stretching into multimodal territory.

★1.1k stars Python Data Tooling

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does Lhotse is a Python toolkit for preparing multimodal training data—speech, audio, video, image, and text—for machine learning pipelines. It provides standard recipes for common corpora, represents metadata in human-readable JSON/YAML manifests, and feeds PyTorch through task-specific Dataset classes. The core abstraction is the “cut”: a slice of audio or video that you can mix, truncate, pad, and augment on-the-fly without pre-baking everything to disk.

The interesting bit The cut abstraction lets you manipulate training samples as lazy, composable objects rather than concrete files. Feature extraction and augmentation can run pre-computed (with optional lilcom compression) or on-demand, and the library supports “feature-space cut mixing”—blending already-computed features rather than raw waveforms. For storage, Lhotse Shar offers a WebDataset-like sequential format optimized for streaming I/O.

Key highlights

Born from the Kaldi speech-processing lineage, paired with the k2 finite-state automata library
Dataset blending and on-the-fly bucketing for multi-corpus training
Built-in deduplication and randomization for distributed multi-node setups
Colab tutorials covering workflows, WebDataset integration, and image/video loading
Extensible backend system for audio (torchaudio, soundfile, torchcodec), I/O, and resampling

Caveats

The README warns that forcing MSCIOBackend for all URLs “may break functionality”
torchaudio is now optional; disabling it strips “many functionalities” though basics remain
Python 3.7+ supported, but some newer backends (e.g., torchcodec) require recent PyTorch versions

Verdict Worth a look if you’re building speech or multimodal models and want Kaldi’s corpus recipes without the C++ plumbing. Less compelling if your data pipeline is already humming in pure PyTorch or another framework.

Frequently asked

What is lhotse-speech/lhotse?: A data-prep library that treats audio snippets like editable clips, now stretching into multimodal territory.
Is lhotse open source?: Yes — lhotse-speech/lhotse is open source, released under the Apache-2.0 license.
What language is lhotse written in?: lhotse-speech/lhotse is primarily written in Python.
How popular is lhotse?: lhotse-speech/lhotse has 1.1k stars on GitHub.
Where can I find lhotse?: lhotse-speech/lhotse is on GitHub at https://github.com/lhotse-speech/lhotse.