Towards interpretable speech biomarkers: exploring MFCCs

In this section we first discuss findings for the MFCC2 feature, followed by observations about higher-order MFCCs.

The acoustic spectra in Fig. 2 show that (especially in the FTD cohort) differences between cases and controls are small at lower frequencies but are noticeable above roughly 4 kHz. As the MFCC2 can be interpreted as a low-to-high energy ratio, the metric appears to be exploiting this spectral difference to discriminate the presence of disease.

There is a well-established literature that links low-to-high energy ratio differences to voice distortion. Breathy voice can be characterized in the low-frequency range via increased amplitude of the first harmonic, as the glottal waveform becomes more rounded due to non-simultaneous closure along the length of the vocal cords24. More relevant to MFCC2, high-frequency energy also increases due to the presence of turbulent airflow…

Read more…