wenet-e2e/wespeaker
A research and production-oriented toolkit for speaker embedding extraction, verification, recognition, and diarization using neural network models.

WeSpeaker provides tools for speaker embedding learning with applications to verification, recognition, and diarization tasks. It implements various neural network architectures including ECAPA-TDNN, CAMPLus, ResNet, and utilizes self-supervised learning models like WavLM and DINO for feature extraction. The toolkit supports both online feature extraction and loading pre-extracted features, and provides pretrained models for Chinese and English speaker tasks.