Automatic Speech Recognition (ASR): 2015/16

[course descriptor]





Review and Tutorial Articles

Syllabus 2015/16

Lecture No.DateWeekLecturerTopic and slidesReading
Lec1Mon 11 JanuaryWk1RenalsIntroduction to Speech Recognition (slides) J&M: chapter 7, chapter 9 (9.1 - 9.3)
R&H review chapter
Lec2Thu 14 January Wk1ShimodairaSpeech Signal Analysis 1 (slides) J&M: Sec 9.3
Taylor, chapters 10, 12
Lec3Mon 18 January Wk2ShimodairaSpeech Signal Analysis 2 Hermansky (1990), PLP analysis of speech
Mon 18 January Wk2RenalsExtra lecture (17:00; FH-3.D01): Introduction to Neural Networks 1 (notes)Michael Nielsen, Neural Networks and Deep Learning, 2015
Lec4 Thu 21 January Wk2 ShimodairaAcoustic modelling basics: HMMs and GMMs 1 (slides-6up,slides) J&M: Secs 6.1-6.5, 9.2, 9.4
G&Y review
R&H review chapter
Rabiner & Juang (1986) Tutorial
Lec5Mon 25 January Wk3ShimodairaAcoustic modelling basics: HMMs and GMMs 2
Lec6Thu 28 January Wk3RenalsContext-dependent phone modelling 1 (slides) Young (2008)
Lee (1990) Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition
Lec7Mon 1 February Wk4ShimodairaIntroduction to Assignment 1 ( slides) Assignment 1: continuous speech recognition
Lec8Thu 4 February Wk4RenalsContext-dependent phone modelling with HMMs 2 Young & Woodland (1994) State clustering in hidden Markov model-based continuous speech recognition
Young et al (1994). Tree-based state tying for high accuracy acoustic modelling,
Fri 5 February Wk4RenalsExtra lecture (15:00; Seminar Room 2, Chrystal Macmillan Building): Introduction to Neural Networks 2 (slides)Michael Nielsen, Neural Networks and Deep Learning, 2015
Lec9Mon 8 February Wk5RenalsLexicon and language model (slides) J&M, Ch 4
Manning & Schutze, Ch 6
Lec10Thu 11 February Wk5RenalsSearch and decoding (slides) Aubert (2002) An overview of decoding techniques for large vocabulary continuous speech recognition
Mon 15 February ILW No Lectures or Labs - Innovative Learning Week
Thu 18 February ILW No Lectures or Labs - Innovative Learning Week
Lec11Mon 22 February Wk6Renals Neural networks for acoustic modelling (slides)Morgan & Bourlard (1995), Continuous speech recognition: An introduction to the hybrid HMM/connectionist approach
Nielsen, Neural Networks and Deep Learning
Lec12Thu 25 February Wk6Renals Deep neural network acoustic models (slides) Hinton et al (2012), Deep neural networks for acoustic modeling in speech recognition
Lec13Mon 29 February Wk7Renals Recurrent network acoustic models (slides) H Sak et al (2014), LSTM recurrent neural network architectures for large scale acoustic modeling, Interspeech-2014.
Lec14Thu 3 March Wk7RenalsSpeaker adaptation (slides) G&Y review, sec. 5
Woodland (2001), Speaker adaptation for continuous density HMMs: A review
Lec15Mon 7 March Wk8Renals Speaker adaptation (cont.); Neural network language models (part 1) (slides)Bengio et al (2006), Neural probabilistic language models(Secs 6.1, 6.2, 6.3, 6.7, 6.8)
Mikolov et al (2011), Extensions of recurrent neural network language model
Lec16Thu 10 March Wk8RenalsNeural network language models part 2 Chen et al (2015), Recurrent neural network language model training with noise contrastive estimation for speech recognition, ICASSP-2015.
Jozefowicz et al (2016), Exploring the Limits of Language Modeling, arXiv:1602.02410.
Thu 10 March Wk8RenalsTutorial (17:00; FH-3.D01)
Lec17Mon 14 March Wk9Peter BellGuest lecture (slides)
Mon 14 March Wk9RenalsTutorial (17:00; FH-3.D01)
Lec18Thu 17 March Wk9RenalsSequence training and "HMM-Free" recognition;(slides)Lu et al (2015), A Study of the Recurrent Neural Network Encoder-Decoder for Large Vocabulary Speech Recognition, Interspeech-2015.
REVISIONWed 20 Aprilat 14:00Revision lecture - answers to any questions


Closer to the exam we are very happy to arrange a revision lecture at a time convenient to everyone. The point of this lecture will be to answer and discuss any questions about the course.


There are two pieces of coursework.

  1. Assignment 1: continuous speech recognition - monophone and triphone models. The coursework will involve training and testing a continuous speech recognition system using the HTK software. We'll use the WSJCAM0 database (British English recordings of speakers reading the Wall Street Journal sentences).
    Released: Monday 1 February 2016
    Deadline: Wednesday 24 February 2016, 16:00
    Feedback: Wednesday 9 March 2016
    Report templates: Q and A on assignment
    Q and A on HTK
  2. Assignment 2: literature review. Choose one of the following topics.

    Released: Thursday 25 February 2016
    Deadline: Wednesday 16 March 2016, 16:00
    Feedback: Wednesday 30 March 2016
    Report templates:

School of Informatics coursework policies:

Software tools


Home : Teaching : Courses 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh