ASR 2018-19  |  News Archive  |  Lectures  |  Labs  |  Coursework  |  Piazza

Automatic Speech Recognition (ASR) 2018-19: Lectures

Lectures will take place on Mondays and Thursdays at 15:10 in the MacLaren Stuart Room, Old College (room G.159), starting on Monday 14 January.

  1. Monday 14 January 2019. Introduction to Speech Recognition (Steve)
    Slides; Reading: J&M: chapter 7, section 9.1; R&H review chapter (sec 1).
  2. Thursday 17 January 2019. Speech Signal Analysis 1 (Hiroshi)
    Slides; revision log Reading: J&M: Sec 9.3; Paul Taylor (2009), Text-to-Speech Synthesis: Ch 10 and Ch 12.
    SparkNG MATLAB realtime/interactive tools for speech science research and education
  3. Monday 21 January 2019. Speech Signal Analysis 2 (Hiroshi)
    Reading: Hermansky (1990), PLP analysis of speech .
  4. Thursday 24 January 2019. HMM acoustic modelling 1: HMMs and GMMs 1 (Hiroshi)
    Slides; revision log
    Reading: J&M: Secs 6.1-6.5, 9.2, 9.4; R&H review chapter (sec 2.1, 2.2); Rabiner & Juang (1986) Tutorial.
  5. Monday 28 January 2019. HMM acoustic modelling 2: HMMs and GMMs 2 (Hiroshi)
    Reading: J&M: Sec 9.7, G&Y review (sections 1, 2.1, 2.2); (J&M: Secs 9.5, 9.6, 9.8 for introductory to decoding).
  6. Thursday 31 January 2019. HMM acoustic modelling 3: Context-dependent phone modelling (Hiroshi)
    Slides (updated on 31 Jan); errata; revision log
    Reading: J&M: Sec 10.3; R&H review chapter (sec 2.3);Young (2008).
  7. Monday 4 February 2019. Neural network acoustic models 1: Introduction (Steve)
    Slides; revision log
    Reading: Jurafsky and Martin (draft 3rd edition), chapter 7 (secs 7.1 - 7.4)
    Background Reading: M Nielsen (2014), Neural networks and deep learning - chapter 1 (introduction), chapter 2 (back-propagation algorithm), chapter 3 (the parts on cross-entropy and softmax).
  8. Thursday 7 February 2019. Neural network acoustic models 2: Hybrid HMM/DNN systems (Steve)
    Slides; revision log
    Background Reading: Morgan and Bourlard (May 1995). Continuous speech recognition: Introduction to the hybrid HMM/connectionist approach, IEEE Signal Processing Mag., 12(3):24-42
    Mohamed et al (2012). Understanding how deep belief networks perform acoustic modelling, ICASSP-2012.
  9. Monday 11 February 2019. Neural network acoustic models 3: Context-dependent DNNs and TDNNs (Steve)
    Slides; revision log
    Reading: Maas et al (2017), Building DNN acoustic models for large vocabulary speech recognition Computer Speech and Language, 41:195-213.
    Peddinti et al (2015). A time delay neural network architecture for efficient modeling of long temporal contexts, Interspeech-2015
    Background Reading: Hinton et al (2012), Deep neural networks for acoustic modeling in speech recognition, IEEE Signal Processing Mag., 29(6):82-97.
  10. Thursday 14 February 2019. Neural Networks for Acoustic Modelling 4: LSTM acoustic models; Sequence discriminative training (Steve)
    Slides; revision log
    Reading: Saon et al (2017), English Conversational Telephone Speech Recognition by Humans and Machines, Interspeech-2017.
    Sec 27.3.1 of Young (2008), HMMs and Related Speech Recognition Technologies.
    K Vesely et al (2013), Sequence-discriminative training of deep neural networks, Interspeech-2013.
    Background Reading: Graves et al (2013), Hybrid speech recognition with deep bidirectional LSTM, ASRU-2013.
    Monday 18 - Friday 22 February 2019.
  11. Monday 25 February 2019. Decoding, Alignment, and WFSTs (Steve)
    Slides; revision log.
    Reading: Mohri et al (2008), Speech recognition with weighted finite-state transducers, in Springer Handbook of Speech Processing (sections 1 and 2)
    Bell and Renals (2015), A system for automatic alignment of broadcast media captions using weighted finite-state transducers, ASRU-2015.
  12. Thursday 28 February 2019. Lattice-free MMI (guest lecturer: Peter Bell)
    Slides; revision log.
    Background Reading: H Hadian et al (2018), Flat-start single-stage discriminatively trained HMM-based models for ASR, IEEE TASLP. D Povey et al (2016), Purely sequence-trained neural networks for ASR based on lattice-free MMI, Interspeech-2016. (ppt slides)
  13. Monday 4 March 2019 Speaker Adaptation (Steve)
    Slides; revision log.
    Reading: G&Y review, sec. 5
    Woodland (2001), Speaker adaptation for continuous density HMMs: A review, ISCA Workshop on Adaptation Methods for Speech Recognition
    Swietojanski et al (2016), Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation, IEEE Trans Audio Speech and Language Proc., 24(8):1450-1463.
  14. Thursday 7 March 2019. Multilingual and low-resource speech recognition (Steve)
    Slides; revision log.
    Reading: Besaciera et al (2014), Automatic speech recognition for under-resourced languages: A survey, Speech Communication, 56:85--100.
    Huang et al (2013). Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers, ICASSP-2013.
  15. Monday 11 March 2019. End-to-end systems 1: CTC (Steve)
    Slides; revision log.
    Reading: A Hannun et al (2014), Deep Speech: Scaling up end-to-end speech recognition, ArXiV:1412.5567.
    A Hannun (2017), Sequence Modeling with CTC, Distill.
    Background Reading: Y Miao et al (2015), EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, ASRU-2105.
    A Maas et al (2015). Lexicon-free conversational speech recognition with neural networks, NAACL HLT 2015.
  16. Thursday 14 March 2019. End-to-end systems 2: Sequence-to-sequence models (Steve)
    Slides; revision log.
    Reading: W Chan et al (2015), Listen, attend and spell: A neural network for large vocabulary conversational speech recognitionICASSP.
    R Prabhavalkar et al (2017), A Comparison of Sequence-to-Sequence Models for Speech Recognition, Interspeech.
    Background Reading: C-C Chiu et al (2018), State-of-the-art sequence recognition with sequence-to-sequence models, ICASSP.
    S Watanabe et al (2017), Hybrid CTC/Attention Architecture for End-to-End Speech Recognition, IEEE STSP, 11:1240--1252.
  17. Monday 18 March 2019. Speaker verification
    Slides; revision log.
    Reading: J Hansen and T Hasan (2015), Speaker Recognition by Machines and Humans: A tutorial review, IEEE Signal Processing Magazine, 32(6): 74-99.
    D Snyder et al (2018), X-Vectors: Robust DNN Embeddings for Speaker Recognition, ICASSP
    Background Reading: MW Mak and JT Chien (2016), Tutorial on Machine Learning for Speaker Recognition, Interspeech.
    N Dehak et al (2011), Front-End Factor Analysis for Speaker Verification, IEEE Trans Audio, Speech, and Language Processing, 19(4):788--798.
    E Variani et al (2014), Deep neural networks for small footprint text-dependent speaker verification, ICASSP.
  18. Thursday 21 March 2019. Speaker diarization
    Slides; revision log.
    Reading: D Garcia-Romero et al (2017), Speaker diarization using deep neural network embeddings, ICASSP.
    Background Reading: G Sell et al (2018), Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge, Interspeech.
    K Church et al (2017), Speaker diarization: A perspective on challenges and opportunities from theory to practice, ICASSP.
  19. Monday 1 April 2019. Discussion session on using speech recognition models and algorithms in practice.
  20. Date to be confirmed. Revision lecture - questions and answers


Textbook (essential)

Review and Tutorial Articles

Other supplementary materials

Copyright (c) University of Edinburgh 2015-2019
The ASR course material is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
This page maintained by Steve Renals.
Last updated: 2019/04/26 17:27:18UTC

Home : Teaching : Courses : Asr 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh