SSDM: Scalable Speech Dysfluency Modeling

Lian, Jiachen; Zhou, Xuanru; Ezzes, Zoe; Vonk, Jet; Morin, Brittany; Baquirin, David; Miller, Zachary; Tempini, Maria L.; Anumanchipalli, Gopala

SSDM: Scalable Speech Dysfluency Modeling

Part of Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track

Authors

Jiachen Lian, Xuanru Zhou, Zoe Ezzes, Jet Vonk, Brittany Morin, David Baquirin, Zachary Miller, Maria Luisa Gorno Tempini, Gopala Anumanchipalli

Abstract

Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions~~\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://berkeley-speech-group.github.io/SSDM/}.

SSDM: Scalable Speech Dysfluency Modeling

Authors

Abstract

Name Change Policy