P62Session 2 (Tuesday 13 January 2026, 14:10-16:40)Disentangling f0 and spectral envelope contributions to speech-in-noise comprehension via selective spectrotemporal modulation filtering
The spectro-temporal modulation (STM) patterns present in speech carry information relevant for intelligibility, such as formants, pitch contour and syllable boundaries. The STM content of a signal can be represented and manipulated in the Modulation Power Spectrum (MPS) domain. Elliott & Theunissen (2009, PLoS Comput Biol, doi:10.1371/journal.pcbi.1000302) showed that certain regions of the MPS are critical for intelligibility. They found that speech-in-noise comprehension was significantly impaired when removing temporal modulations below 12 Hz or spectral modulations below 4 cycles/kHz. However, a limitation of this approach is that the MPS representation jointly encodes the spectral envelope and harmonic structure, which are thus both affected by STM filtering. When comprehension drops, it is therefore impossible to determine if the effect was purely due to the degradation of envelope information or due to the removal of fundamental frequency (f0) cues.
We address this limitation by applying the same filtering approach as Elliott & Theunissen, but separately processing the f0 contour and the spectral envelope. This makes it possible to independently transform these two components by including or excluding the f0 in the signal or by selectively reintroducing the f0 after filtering the spectral envelope. This approach thereby grants more specificity in the manipulations one can apply to a speech signal and should allow the disentanglement of the relative contributions of the acoustic components for comprehension.
Using the PyWorld package, our processing pipeline decomposes a speech signal into its f0 and spectral envelope, calculates the MPS from the latter, applies a lowpass filtering in the spectral or temporal modulation domain, and subsequently resynthesizes a new speech signal given the filtered MPS and the unchanged or filtered f0. Using the same corpus of 100 sentences as in the original study, we aim to test how filtering restricted to the spectral envelope of the signal may affect speech-in-noise comprehension. Stimuli will be randomly drawn, and their presentation balanced between filtering conditions and signal-to-noise ratios.
In this context, we expect speech comprehension to be higher compared to Elliot & Theunissen’s study, given the preserved pitch contour (aiding in auditory unmasking and stream segregation). In particular, the minimal spectral-modulation cutoff for speech-in-noise comprehension should be lower than estimated in the original study and provide a better estimate of the required resolution of the spectral envelope itself, without interference from a parallel processing pathway for pitch.