P44Session 2 (Tuesday 13 January 2026, 14:10-16:40)AI-based real-time speaker separation: An alternative approach to beamforming.
Research Question: The so-called cocktail party problem describes the limited ability of hearing aid and cochlear implant users to follow a target speaker in noisy multi-talker environments. Classical approaches such as beamforming improve speech intelligibility using multiple microphones to extract and enhance the signal of a single speaker. The goal of this study was to develop an AI-based real-time speaker separation method that achieves good results using only one microphone.
Methods: A web-based system for blind source separation (BSS) was implemented, processing continuous audio data in 0.5-second windows. The separation is achieved using neural AI models (TDANet, TIGER) within a FastAPI/OpenVINO framework. The AI models, pre-trained on English data, were optimized for real-time application and applied to German audio data. Real-time processing was successfully implemented at a sampling rate of 16 kHz and a window length of 0.5 seconds. A WebAudio frontend with AudioWorklets handles streaming, playback, and visualization in real time - completely within the browser and without any special hardware requirements.
Results: The system achieved a latency below 300 ms on standard laptops (without GPU) and operated continuously without dropouts. Subjective reports from test participants indicated a clear separation of speaker voices and an improved intelligibility in overlapping speech. All signal processing was performed stably via the WebSocket stream, and visual feedback (waveform/STFT) enabled immediate assessment of separation performance in real time.
Conclusions: The combination of efficient model optimization and web-based architecture enables AI-based real-time speaker separation using only a single microphone. This concept shows potential for future intelligent hearing aids and cochlear implants that could perform AI-based separation directly on the user device.