P67Session 1 (Monday 12 January 2026, 15:00-17:30)Development of an ecologically-valid speech intelligibility test using virtual acoustics
Speech intelligibility (SI) testing plays a central role in audiology and hearing research, indicating how well speech can be understood in quiet or noise. Traditional SI tests rely on isolated sentences or single words presented in a highly controlled, but unnatural environment. While this approach ensures repeatability, it lacks ecological validity and often fails to reflect the listening abilities in everyday life. With the use of virtual acoustics and text-to-speech (TTS) synthesis, we designed and evaluated a SI test in a virtual acoustic everyday environment with conversational contents. The goal was to create a test design in a representative everyday situation, that is semantically coherent and acoustically authentic.
The virtual test environment simulated a busy cafeteria. Background noise was the recording of a real cafeteria with a spatial microphone array. Synthetic speech material generated by two different TTS providers (Google TTS and Acapela) was embedded into the ambient noise recordings and virtually presented from the frontal direction. The complete acoustic scene was presented in the anechoic chamber of the University of Applied Sciences Lübeck using a spherical 65-loudspeaker array with 7th order Ambisonics rendering. The ambient cafeteria noise was presented at a level of 65 dB SPL. Test target-stimuli consisted of TTS-generated dialogues containing both carrier phrases and keywords to be recognized, presented at fixed signal-to-noise ratios (SNRs) of −11 dB and −8 dB, corresponding to the SNR30% and SNR70% conditions, respectively. Twenty-four normal-hearing participants completed multiple test runs under both voice conditions. Speech intelligibility was determined by keyword recognition, and the results were used to derive psychometric functions for each condition. In addition to measuring SI, the participants gave subjective ratings of the voices’ naturalness, the scenes’ realism, and the perceived difficulty of each measurement by responding to six-step Likert scales.
The results showed that psychometric functions were successfully inferred from the data using both SNR conditions. Participants rated the SNR70% as significantly more realistic. While no significant differences in perceived difficulty were found between voice providers, Google voices were rated as more natural and exhibited smaller test-retest bias compared to Acapela. Acapela voices, however, yielded slightly steeper psychometric function slopes, suggesting greater measurement sensitivity near the speech reception threshold.
In conclusion, the study demonstrates that synthetic speech embedded in a simulated cafeteria environment enables reliable and ecologically valid SI measurements.
This study was funded by the German ministry of research, technology and space affairs.