Automatic Speech Recognition (ASR): 2016/17

[course descriptor]





Review and Tutorial Articles

Lectures 2016/17

Lecture No.DateWeekLecturerTopic and slidesReading
1Mon 16 JanuaryWk1Renals Introduction to Speech Recognition (slides; lecture recording; revision log) J&M: chapter 7, chapter 9 (9.1 - 9.3)
R&H review chapter
2Thu 19 January Wk1Shimodaira Speech Signal Analysis 1 (slides; additional-notes; lecture recording; revision log) J&M: Sec 9.3
3Mon 23 January Wk2Shimodaira Speech Signal Analysis 2 (lecture recording) Hermansky (1990), PLP analysis of speech
4Thu 26 January Wk2Shimodaira Acoustic modelling: HMMs and GMMs 1 (slides; additional notes; lecture recording; revision log) J&M: Secs 6.1-6.5, 9.2, 9.4
G&Y review
R&H review chapter
Rabiner & Juang (1986) Tutorial
5Mon 30 January Wk3Shimodaira Acoustic modelling: HMMs and GMMs 2 (lecture recording)
6Thu 2 February Wk3Shimodaira Acoustic modelling: Context-dependent phone modelling (slides; lecture recording; revision log) Young (2008)
-Mon 6 February Wk4 NO LECTURE
7Thu 9 February Wk4Renals Introduction to neural networks (slides; lecture recording; revision log) M Nielsen (2014), Neural networks and deep learning
8Mon 13 February Wk5Renals Neural network acoustic models 1 (slides; lecture recording; revision log) Morgan and Bourlard (1995), Continuous speech recognition: Introduction to the hybrid HMM/connectionist approach
9Thu 16 February Wk5Renals Neural network acoustic models 2 (slides; lecture recording; revision log) Hinton et al (2012), Deep neural networks for acoustic modeling in speech recognition
Vesely et al (2013), Sequence-discriminative training of deep neural networks
Mon 20 February No Lectures or Labs - Flexible Learning Week
Thu 23 February No Lectures or Labs - Flexible Learning Week
10Mon 27 February Wk6Renals Speaker adaptation (slides; lecture recording - including NN adaptation material covered in first half of next lecture; revision log) G&Y review, sec. 5
Woodland (2001), Speaker adaptation for continuous density HMMs: A review
Swietojanski et al (2016), Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation
11Thu 2 March Wk6Renals Lexicon and pronunciations (slides; lecture recording; revision log)
12Mon 6 March Wk7Renals Language models (slides; lecture recording - including LM material covered in first part of next lecture; revision log) J&M, Ch 4
Manning & Schutze, Ch 6
Bengio et al (2006), Neural probabilistic language models(Secs 6.1, 6.2, 6.3, 6.7, 6.8); Mikolov et al (2011), Extensions of recurrent neural network language model
Jozefowicz et al (2016), Exploring the Limits of Language Modeling, arXiv:1602.02410.
13Thu 9 March Wk7Renals WFSTs (slides; lecture recording; revision log) Mohri et al (2008), Speech recognition with weighted finite-state transducers
14Mon 13 March Wk8Peter Bell ASR and alignment systems for multi-genre media data (guest lecture; slides; lecture recording) Bell and Renals (2015), A system for automatic alignment of broadcast media captions using weighted finite-state transducers
15Thu 16 March Wk8Renals Sequence discriminative training; robust speech recognition (slides; lecture recording - including additional material presented at start of lecture 18; revision log) Vesely et al (2013), Sequence-discriminative training of deep neural networks; Seltzer et al (2013) An Investigation of Deep Neural Networks for Noise Robust Speech Recognition
16Mon 20 March Wk9Renals Multilingual speech recognition (slides; lecture recording; revision log) Besaciera et al (2014), Automatic speech recognition for under-resourced languages: A survey, Speech Communication, 56:85--100.
17Thu 23 March Wk9Renals Current progress in acoustic modelling (slides; lecture recording; revision log) IBM blog post;
Saon et al (2017), English Conversational Telephone Speech Recognition by Humans and Machines, arXiv:1703.02136.
18Mon 27 March Wk10Renals End-to-end systems (slides; lecture recording; revision log) Lu et al (2015), A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition, Interspeech-2015.
19Thu 30 March Wk10Renals Extra lecture: NN generative models and WaveNet (slides; lecture recording; revision log) Deep Mind blog post on Wavenet;
van den Oord et al (2016), WaveNet: A Generative Model for Raw Audio, arXiv:1609:03499.
REVISIONThu 27 April, 14:00. Room S1, 7 George Square Revision lecture - answers to any questions


Closer to the exam we are very happy to arrange a revision lecture at a time convenient to everyone. The point of this lecture will be to answer and discuss any questions about the course.


The coursework will concern a continuous speech recognition and will use Kaldi. It will build on the labs.
Released: Monday 13 February 2017. [Assignment], [Q and A]
Deadline: Wednesday 8 March 2017, 16:00 [Checklist for submission]
Feedback: Wednesday 22 March 2017

Please make sure you have read and understood

Software tools


Home : Teaching : Courses 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh