ASR 2018-19  |  News Archive  |  Lectures  |  Labs  |  Coursework  |  Piazza

Lecture 10 - Neural Network Acoustic Models 4: LSTM acoustic models; Sequence discriminative training

Speech recognition is about modelling sequences: given a sequence of acoustic frames, what should be the corresponding sequence of symbols. HMMs are a surprisingly strong sequence model, and have been at the heart of speech recognition since they were introduced in the 1970s. They were introduced earlier in the course along with powerful algorithms like Viterbi and EM; they are trained to a maximum likelihood criterion, which is a different to the classification criterion we are primarily interested in - building the best model of speech is merely a means to the end of classifying the speech as the correct sequence of symbols.

One of the powerful aspects of the neural network methods introduced in the three previous lectures is that they enable discriminative training using softmax and the cross-entropy error function. However their sequence modelling is limited. Using a context window over the input, and in particular using a TDNN architecture, enables the local matching score to take account of a wide receptive field of acoustic context. However these systems still use a frame-level loss function - in contrast to the sequence level loss function used when training HMMs.

Probably the best paper on LSTM acoustic models is the one by Graves et al: Hybrid speech recognition with deep bidirectional LSTM. Sequence discriminative training for HMM/GMMs is covered in Young's handbook article. Sequence training for DNNs is well-covered by Veseley et al, Sequence-discriminative training of deep neural networks.

In this lecture we explored two ways of better modelling sequences:

RNNs and LSTM acoustic models

Sequence discriminative training


Copyright (c) University of Edinburgh 2015-2019
The ASR course material is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
licence.txt
This page maintained by Steve Renals.
Last updated: 2019/04/23 16:53:29UTC


Home : Teaching : Courses : Asr 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh