Lecture 14 - Sequence discriminative training
This lecture discussed sequence discriminative training for both GMM and NN based systems. In sequence discriminative training, the objective function used in training is discriminative (adjust the model to increase the probability of the correct sequence and decrease the probability of competing sequences) and is at the sequence level (discrininate between sequences rather than frames).
-
Maximum likelihood training - adjust the parameters to maximise the likelihood of the correct sequence; discriminative training aims to maximise a ratio of the probability of the correct sequence vs probability of competing sequences.
-
One such sequence discriminatice approach is maximum mutual information (MMI) estimation - expressed as a numerator part (clamped to a reference word sequence) and denominator part (free, evaluating all possible word sequences)
-
In order to use MMI training in practice, use lattices for computing the denominator
-
An alternative objective function, closer to the evaluation function we are interested in, is the minimum phone error criterion - explicitly weighting with a phone error rate term
-
NN acoustic models are discriminative, but at the frame level (using cross-entropy objective). It is possible to apply sequence discriminative training to neural network acoustic models. Use CE-trained model to generate alignments and lattices for sequence training and to initialise the weights, then train using back-propagation with sequence training objective function (e.g. MMI)
-
Results comparing ML and discriminatively trained GMM systems and framewise and sequence trained NN systems on Switchboard. Sequence discriminative training gives 10-15% (relative) reduction in word error rate.
- Lattice-free MMI (LF-MMI) can be used for NN sequence training. It avoids the need to pre-compute lattices for the denominator and avoids the requirement to train using frame-based CE loss function, before sequence training.
- LF-MMI directly applies forward-backward computations to the denominator, using WFST computations. Several approximations are applied to increase efficiency. Current state of the art training approach for NN acoustic models.
Copyright (c) University of Edinburgh 2015-2018
The ASR course material is licensed under the
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
licence.txt
This page maintained by Steve Renals.
Last updated: 2018/04/30 21:13:34UTC