Deep attentive end-to-end continuous breath sensing from speech

Abstract:

Modelling of the breath signal is of high interest to both healthcare professionals and computer scientists, as a source of diagnosis-related information, or a means for curating higher quality datasets in speech analysis research. The formation of a breath signal gold standard is, however, not a straightforward task, as it requires specialised equipment, human annotation budget, and even then, it corresponds to lab recording settings, that are not reproducible in-the-wild. Herein, we explore deep learning based methodologies, as an automatic way to predict a continuous-time breath signal by solely analysing spontaneous speech. We address two task formulations, those of continuous-valued signal prediction, as well as inhalation event prediction, that are of great use in various healthcare and Automatic Speech Recognition applications, and showcase results that outperform current baselines. Most importantly, we also perform an initial exploration into explaining which parts of the input audio signal are important with respect to the prediction.