SPIN2026: No bad apple! SPIN2026: No bad apple!

P75Session 1 (Monday 12 January 2026, 15:00-17:30)
Non-intrusive prediction of human speech recognition using mistuned binaural processing and posterior phoneme probabilities

Simon Weihe 
University of Oldenburg, Dep. Medical Physics and Acoustics, Oldenburg, Germany

Jan Rennies-Hochmuth
Fraunhofer IDMT, Hearing, Speech and Audio Technology, Oldenburg, Germany
University of Oldenburg, Dep. Medical Physics and Acoustics, Oldenburg, Germany

Thomas Brand
University of Oldenburg, Dep. Medical Physics and Acoustics, Oldenburg, Germany

The prediction of speech intelligibility in real-time via a non-intrusive binaural model would be a convenient tool in research and applications for hearing aids. A front-end using Equalisation-Cancellation (EC) processing is one way to model binaural release from masking. Non-intrusiveness and real-time capability require the calculation of a binaurally speech enhanced signal, which can then be further analysed for speech intelligibility prediction. The modelling of inaccuracies in the human binaural processing is an additional requirement. To our knowledge, so far none of the published EC front-ends fulfil all of these requirements.

Human inaccuracies were previously modelled with Monte-Carlo simulations which, however, do not produce a defined signal and are not suitable for real-time processing. Therefore, we suggest replacing them with a deterministic mistuning of the interaural equalisation parameters.

This approach was evaluated with the standard and modified Speech Intelligibility Index (SII) and the Mean Temporal Distance (MTD) as respective back-ends, to compare the model predictions with previous studies and own measurements. Only the latter fulfils the requirement of non-intrusiveness and produces reliable predictions of speech reception thresholds (SRT) in reverberant rooms.

Last modified 2025-11-21 16:50:42