SPIN2026: No bad apple! SPIN2026: No bad apple!

P60Session 2 (Tuesday 13 January 2026, 14:10-16:40)
Evaluating the intelligibility of conversational speech for the CHiME-9 ECHI challenge

Robert Sutherland
School of Computer Science, University of Sheffield, Sheffield, United Kingdom

Stefan Goetze
South Westphalia University of Applied Sciences, Iserlohn, Germany

Jon P Barker
School of Computer Science, University of Sheffield, Sheffield, United Kingdom

Improving the performance of assistive hearing devices, such as hearing aids, is crucial for supporting communication in noisy everyday environments. In this pursuit, a large dataset has been collected for an open challenge focusing on speech enhancement in conversations, “CHiME-9: Enhancing Conversations to address Hearing Impairment (ECHI)”. The speech enhancement systems submitted to this challenge will be assessed using listening tests in March 2026. While there are standardised methods for speech quality assessment, research on assessing conversational speech intelligibility is more limited. This work highlights key challenges in designing such tests and proposes a novel evaluation scheme to assess speech intelligibility in conversations.

The CHiME-9 ECHI Challenge focuses on four-party conversations in noisy environments. The task is to remove background noise while preserving the speech of the three conversation partners, using audio recorded with hearing aids or Project Aria glasses. This dataset consists of 30 hours of audio from 49 sessions, with 194 unique speakers. Challenge participants will submit audio for the evaluation set, which consists of nine sessions.

To evaluate the effectiveness of these systems, intelligibility in the context of conversations must be defined, and then a natural task must be designed, with an emphasis on ecological validity. Preliminary pilots have identified several key challenges which make this kind of evaluation difficult. Segments must be designed appropriately, so they are semantically coherent and complete, and easy to parse, remember and repeat. Listeners must be provided with context so they can ‘tune into’ the sample in a one-shot listening paradigm. There must also be a consistent cue for the target, so that listeners know which speaker to focus on.

Consideration of these factors has led to the development of a new listening test methodology, which provides listeners with ecologically valid cues to encourage them to attend to a specific target speaker. To do this, listeners are provided with a clean speech sample of the target speaker, the prior context of the conversation, cues for when the target speaker is talking and training samples for each target speaker.

All tools developed for this project will be open-sourced, and the audio signals and intelligibility labels will be released to the research community after the conclusion of the CHiME-9 ECHI Challenge.

Acknowledgements: This work was supported by the UKRI AI Centre for Doctoral Training in Speech and Language Technologies (SLT) and their Applications funded by UK Research and Innovation [grant number EP/S023062/1]. This work was also supported by WS Audiology.

Last modified 2025-11-21 16:50:42