Using recurrent neural networks to improve the perception of speech in non-stationary noise by people with cochlear implants

Abstract:

Speech-in-noise perception is a major problem for users of cochlear implants (CIs), especially with non-stationary background noise such as competing talkers or traffic. Algorithms that facilitate speech perception by attenuating background noise have produced benefits but relied on a priori information about the target speaker and/or background noise. We developed a recurrent neural network (RNN) algorithm for enhancing speech in non-stationary noise and evaluated its benefits for speech perception, using objective measures, experiments with normal-hearing (NH) subjects listening to CI simulations and experiments with CI users. The RNN was trained using a data set that included speech from many talkers mixed with a set of real-world multi-talker or traffic noise recordings. Its performance was evaluated using speech from a novel talker mixed with novel noise recordings of the same class, either babble or traffic noise. The signal-to-noise ratios also differed from those used for training. Objective measures indicated benefits of using a recurrent architecture over a simpler feed-forward architecture and predicted better speech intelligibility than for the unprocessed speech in noise. The experimental results showed significant improvements in speech intelligibility for the speech in babble for both CI simulations and CI subjects; speech reception thresholds were improved by 1.4 to 3.4 dB. There was no significant improvement for the traffic noise. CI subjects rated stimuli processed using the RNN algorithm as significantly better in terms of speech distortions, noise intrusiveness and overall quality than unprocessed stimuli for both babble and traffic noise, with larger improvements for the former. These results extend previous findings and indicate benefits in speech-in-noise performance by CI listeners for mostly unseen acoustic conditions when using a speaker-independent algorithm that was optimized for non-stationary noises.