Publications

Conference Papers


PhoenixDSR: Phoneme-Guided and LLM-Enhanced Dysarthric Speech Recognition

Published in ICASSP 2026, 2026

Automatic speech recognition remains weak on dysarthric speech because of limited data and speaker variability. We propose PhoenixDSR, a phoneme-mediated framework that separates acoustic variability from linguistic decoding. A Wav2Vec2-CTC model trained on healthy speech yields stable phonemes, while a weighted confusion matrix captures global and speaker-specific dysarthric patterns. A lightweight LLM decoder performs multi-task phoneme–text repair. PhoenixDSR achieves strong, data-efficient, and robust results on CDSD.

Recommended citation: Wu, Y., Xu, Y., Wang, J., Zhao, X., Jiang, J., & Luo, Z. (2026). PHOENIXDSR: Phoneme-Guided and LLM-Enhanced Dysarthric Speech Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Download Paper

Re-Sonance: A Dysarthric Asynchronous Real-Time Speech Conversion System Based on a Three-Stage Cascaded ASR-LLM-TTS Architecture

Published in NCMMSC 2025, 2025

Individuals with dysarthria face major difficulties in professional speaking scenarios requiring real-time communication. Existing AAC systems often suffer from high latency and unnatural speech. We introduce Re-Sonance, an LLM-enhanced, speech-driven AAC system integrating Whisper ASR, Qwen LLM, and CosyVoice TTS for real-time use. Evaluations on Mandarin dysarthric speech show improved intelligibility and naturalness while preserving semantics for mild to moderate dysarthria, highlighting the promise of LLM-based AAC systems.

Recommended citation: Wu, Y., Xu, Y., Wang, J., Zhao, X., Jiang, J., & Luo, Z. (2025). Re-Sonance: A Dysarthric Asynchronous Real-Time Speech Conversion System Based on a Three-Stage Cascaded ASR-LLM-TTS Architecture. Proceedings of the 2025 National Conference on Man–Machine Speech Communication (NCMMSC).
Download Paper | Download Slides