ASR 2019-20
| News Archive
| Lectures
| Labs
| Coursework
| Piazza
Automatic Speech Recognition (ASR) 2019-20: Lectures
Lectures will take place on Mondays and Thursdays at 15:10, starting Monday 13 January. Note change of venue: Monday lectures are now in LG.11, David Hume Tower. Thursday lectures are in F.21, 7 George Square. Future lecture topics are subject to change.
-
Monday 13 January 2020.
Introduction to Speech Recognition
Slides
Reading:
J&M: chapter 7, section 9.1; R&H review chapter (sec 1).
-
Thursday 16 January 2020.
HMM acoustic modelling 1: HMMs and GMMs
Slides (updated 18 Jan);
errata.
Reading:
J&M: Secs 6.1-6.5, 9.2, 9.4;
R&H review chapter (sec 2.1, 2.2);
Rabiner & Juang (1986) Tutorial.
-
Monday 20 January 2020.
HMM acoustic modelling 2: HMM algorithms
Slides (updated 21 Jan);
errata.
Reading:
J&M: Sec 9.7,
G&Y
review (sections 1, 2.1, 2.2);
(J&M: Secs 9.5, 9.6, 9.8 for introduction to decoding).
-
Thursday 23 January 2020.
Speech Signal Analysis 1
Slides
Reading:
J&M: Sec 9.3; Paul Taylor (2009), Text-to-Speech Synthesis: Ch 10 and Ch 12.
SparkNG MATLAB realtime/interactive tools for speech science research and education
-
Monday 27 January 2020.
Speech Signal Analysis 2
Reading:
Hermansky (1990), PLP analysis of speech .
-
Thursday 30 January 2020.
HMM acoustic modelling 3: Context-dependent phone modelling
Slides
Reading:
J&M: Sec 10.3;
R&H review chapter (sec 2.3); Young (2008).
-
Monday 3 February 2020.
Neural network acoustic models 1: Introduction
Slides (updated 3 Feb)
Reading:
Jurafsky and Martin (draft 3rd edition), chapter 7 (secs 7.1 - 7.4)
Background Reading:
M Nielsen (2014), Neural networks and deep learning - chapter 1 (introduction), chapter 2 (back-propagation algorithm), chapter 3 (the parts on cross-entropy and softmax).
-
Thursday 6 February 2020.
Neural network acoustic models 2: Hybrid HMM/DNN systems
Slides
Background Reading:
Morgan and Bourlard (May 1995). Continuous speech recognition: Introduction to the hybrid HMM/connectionist approach, IEEE Signal Processing Mag., 12(3):24-42
Mohamed et al (2012). Understanding how deep belief networks perform acoustic modelling, ICASSP-2012.
-
Monday 10 February 2020.
Large vocabulary ASR
Slides (updated 10 Feb)
-
Thursday 13 February 2020.
ASR with WFSTs
Slides
Reading:
Mohri et al (2008), Speech recognition with weighted finite-state transducers, in Springer Handbook of Speech Processing (sections 1 and 2)
Monday 17 - Friday 21 February 2020.
NO LECTURES OR LABS - FLEXIBLE LEARNING WEEK.
-
Monday 24 February 2020.
Neural Networks for Acoustic Modelling 3: Context-dependent DNNs, TDNNs and LSTMs
Slides
Reading:
Maas et al (2017), Building DNN acoustic models for large vocabulary speech recognition Computer Speech and Language, 41:195-213.
Background reading: Peddinti et al (2015). A time delay neural network architecture for efficient modeling of long temporal contexts, Interspeech-2015
Graves et al (2013), Hybrid speech recognition with deep bidirectional LSTM, ASRU-2013.
-
Thursday 27 February 2020.
Speaker Adaptation
Slides
Reading:
G&Y review, sec. 5
Woodland (2001), Speaker adaptation for continuous density HMMs: A review, ISCA Workshop on Adaptation Methods for Speech Recognition
Swietojanski et al (2016), Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation, IEEE Trans Audio Speech and Language Proc., 24(8):1450-1463.
-
Monday 2 March 2020
Sequence discriminative training
Slides
Reading:
Sec 27.3.1 of Young (2008), HMMs and Related Speech Recognition Technologies.
Background reading:
Vesely et al (2013), Sequence-discriminative training of deep neural networks
D Povey et al (2016), Purely sequence-trained neural networks for ASR based on lattice-free MMI, Interspeech-2016. (ppt slides)
-
Thursday 5 March 2020.
Multilingual and low-resource speech recognition
Slides
Reading:
Besaciera et al (2014), Automatic speech recognition for under-resourced languages: A survey, Speech Communication, 56:85--100.
Huang et al (2013). Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers, ICASSP-2013.
-
Monday 9 March 2020.
End-to-end systems 1: CTC
Slides (updated 12 March)
Reading:
A Hannun et al (2014), Deep Speech: Scaling up end-to-end speech recognition, ArXiV:1412.5567.
A Hannun (2017), Sequence Modeling with CTC, Distill.
Background Reading:
Y Miao et al (2015), EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, ASRU-2105.
A Maas et al (2015). Lexicon-free conversational speech recognition with neural networks, NAACL HLT 2015.
-
Thursday 12 March 2020.
End-to-end systems 2: Encoder-decoder models
Slides (updated 26 Apr)
Reading:
W Chan et al (2015), Listen, attend and spell: A neural network for large vocabulary conversational speech recognitionICASSP.
R Prabhavalkar et al (2017), A Comparison of Sequence-to-Sequence Models for Speech Recognition, Interspeech.
Background Reading:
C-C Chiu et al (2018), State-of-the-art sequence recognition with sequence-to-sequence models, ICASSP.
S Watanabe et al (2017), Hybrid CTC/Attention Architecture for End-to-End Speech Recognition, IEEE STSP, 11:1240--1252.
-
Monday 16 March 2020.
Speaker verification and diarization
Slides
Reading:
J Hansen and T Hasan (2015), Speaker Recognition by Machines and Humans: A tutorial review, IEEE Signal Processing Magazine, 32(6): 74-99.
D Snyder et al (2018), X-Vectors: Robust DNN Embeddings for Speaker Recognition, ICASSP
D Garcia-Romero et al (2017), Speaker diarization using deep neural network embeddings, ICASSP.
Background Reading:
MW Mak and JT Chien (2016), Tutorial on Machine Learning for Speaker Recognition, Interspeech.
N Dehak et al (2011), Front-End Factor Analysis for Speaker Verification, IEEE Trans Audio, Speech, and Language Processing, 19(4):788--798.
E Variani et al (2014), Deep neural networks for small footprint text-dependent speaker verification, ICASSP.
G Sell et al (2018), Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge, Interspeech.
K Church et al (2017), Speaker diarization: A perspective on challenges and opportunities from theory to practice, ICASSP.
-
Date to be confirmed.
Revision lecture – questions and answers
Reading
Textbook (essential)
- J&M: Daniel Jurafsky and James H. Martin (2008). Speech and Language Processing, Pearson Education (2nd edition).
Review and Tutorial Articles
- G&Y: MJF Gales and SJ Young (2007). The Application of Hidden Markov Models in Speech Recognition, Foundations and Trends in Signal Processing, 1 (3), 195-304.
- S Young (1996). A review of large-vocabulary continuous-speech recognition, IEEE Signal Processing Magazine 13 (5), 45-57.
- R&H:S Renals and T Hain (2010). Speech Recognition, in Computational Linguistics and Natural Language Processing Handbook, A Clark, C Fox and S Lappin (eds.), Blackwells, chapter 12, 299-332.
- G Hinton et al (2012).
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 29(6):82-97.
- S Young (2008). HMMs and Related Speech Recognition Technologies, in Springer Handbook of Speech Processing, J Benesty, MM Sondhi and Y Huang (eds), chapter 27, 539-557.
Other supplementary materials
- In case you need more introductory articles on speech signal analysis (Lectures 2 and 3):
Daniel P.W. Ellis, "An introduction to signal processing for speech",
Chapter 22 in The Handbook of Phonetic Science, 2nd ed.,
ed. Hardcastle, Laver, and Gibbon. pp. 757-780, Blackwell, 2008.
- Speech.zone by Prof Simon
King at the University of Edinburgh.
Copyright (c) University of Edinburgh 2015-2020
The ASR course material is licensed under the
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
licence.txt
This page maintained by Peter Bell.
Last updated: 2020/04/26 20:33:31UTC