ASR Lecture Log 9

Lecture 9 - Neural Networks for Acoustic Modelling 2

After to the hybrid HMM/NN approach in the previous lecture, in which it was shown how a neural network can replace a GMM in an HMM system, in this lecture we looked at how this approach was used to develop accurate acoustic models for TIMIT phone recognition and Switchboard conversational speech recognition.

First we recapped the hybrid HMM/NN approach introduced in the previous lecture, and then looked at a typical deep neural network architecture that could be used for TIMIT phone recognition

After a digression to explain the idea of pretraining, we discussed the Mohamed et al (2012) paper which carried out a careful set of experiments on TIMIT, varying the depth and width of the hidden layers, as well as comparing MFCC with mel-scale filter bank (FBANK) acoustic features. These experiments indicated that wider layers improved the accuracy, as did depth up to about 6 hidden layers. FBANK features (which have correlated components) were somewhat more accurate than MFCCs.

Hidden layer representations can be visualised using t-SNE which projects the high dimension features (the dimension is the nuimber of units in the layer) down to 2 or 3 dimensions which may be visualised. These visualisations showed that the learned representations for FBANK features resulted in slightly more structure when compared with MFCCs.

Finally we discussed a DNN acoustic model for Switchboard. The main difference of this model is that it uses context-dependent HMMs, thus the neural network output layer has a unit for each state-clustered context-dependent HMM state. This can result in wide output layers (dimension of over 9000 in the experiments discussed).

Both the TIMIT and Switchboard experiments relied on first training a context-dependent HMM/GMM system and using the context-dependent states inferred for the systems, and the frame-state alignment from the trained HMM/GMM system, in order to generate target label sequence required to train the neural network.

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh