PhoenixDSR: Phoneme-Guided and LLM-Enhanced Dysarthric Speech Recognition
Published in ICASSP 2026, 2026
Automatic speech recognition remains weak on dysarthric speech because of limited data and speaker variability. We propose PhoenixDSR, a phoneme-mediated framework that separates acoustic variability from linguistic decoding. A Wav2Vec2-CTC model trained on healthy speech yields stable phonemes, while a weighted confusion matrix captures global and speaker-specific dysarthric patterns. A lightweight LLM decoder performs multi-task phoneme–text repair. PhoenixDSR achieves strong, data-efficient, and robust results on CDSD.
Recommended citation: Wu, Y., Xu, Y., Wang, J., Zhao, X., Jiang, J., & Luo, Z. (2026). PHOENIXDSR: Phoneme-Guided and LLM-Enhanced Dysarthric Speech Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Download Paper
