SPIN2026: No bad apple! SPIN2026: No bad apple!

P38Session 2 (Tuesday 13 January 2026, 14:10-16:40)
Evaluating phonetic distance metrics as predictors of speech intelligibility in noise

Maya Hola, Paul Iverson
University College London, United Kingdom

Measuring phonetic distance between speakers is essential for understanding speech intelligibility in noise, yet methods for quantifying these differences vary widely. Recent advances in AI-based speech representations offer new approaches to capture talker-listener similarity, but their effectiveness relative to traditional acoustic metrics remains unclear. We compared four types of phonetic distance metrics—legacy acoustic measure (ACCDIST), automated edit distances, embedding-based measures from self-supervised models, and novel logit-based measures—to determine which best predicts intelligibility within a homogeneous accent community. Forty standard southern British English (modern RP) speakers participated as both talkers and listeners in a sentence recognition task across four signal-to-noise ratios. Talker typicality—measured as average distance from a talker to all other talkers—strongly predicted intelligibility, with individual talker-listener distance providing additional explanatory power. Embedding-based metrics from an English phoneme recognizer optimally captured the phonetic variability relevant to intelligibility, with logit-based measures offering a computationally efficient alternative. These findings demonstrate that fine-grained phonetic variation within a single accent group systematically affects comprehension in adverse conditions, extending accent-distance theory to the idiolectal level and providing practical guidance for selecting distance metrics in speech intelligibility research.

Last modified 2025-11-21 16:50:42