← all repositories
k2-fsa/k2

Finite-state machines that learn backward, too

k2 wires classic FSA/FST algorithms into PyTorch autograd so you can train speech recognizers with CTC, MMI, and lattice rescoring in one backward pass.

k2
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does k2 implements Finite State Automaton and Transducer algorithms in C++/CUDA, then exposes them to PyTorch. The target use case is speech recognition: decoding, CTC training, LF-MMI training, lattice rescoring, and confidence estimation—all differentiable, all composable in a single training graph.

The interesting bit Instead of making every micro-operation differentiable (the PyTorch/TensorFlow way), k2 computes derivatives top-down by tracking which input arcs contributed to each output arc. That sparse, arc-level bookkeeping gets wrapped into a PyTorch Function for backward passes. The authors claim it’s more efficient and has better roundoff properties than bottom-up autograd.

Key highlights

  • Core data structure is a templated Ragged tensor—think TensorFlow’s RaggedTensor, but arrived at independently and used very differently.
  • Algorithms are written as C++11 lambdas operating directly on data pointers; CUDA kernels instantiate from the same templates via cub for reductions like exclusive-prefix-sum.
  • Heavy lifting is “embarrassingly parallelizable”; the authors say most code looks like normal C++.
  • Python bindings via pybind11; PyTorch integration is done.
  • Active recipes and Colab notebooks live in the separate icefall repo.
  • A v2.0-pre branch exists for production readiness.

Caveats

  • The README admits the Ragged-based algorithms are hard to understand without reading the code directly; the parallel structure looks nothing like CPU-native FST implementations.
  • No claims of Word Error Rate improvement over existing ASR tech—the pitch is generality and extensibility, not raw accuracy.

Verdict Worth a look if you’re building or researching speech recognition pipelines and need to backprop through decoding graphs. Probably overkill if you’re just fine-tuning a standard CTC model with off-the-shelf tools.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.