ASR 2021-22  |  News Archive  |  Lectures  |  Labs  |  Coursework  |  Piazza

Automatic Speech Recognition (ASR) 2021-22: Lectures

Lectures take place on Mondays at 13:10 and Thursdays at 15.10, starting Monday 17 January. Future lecture topics are subject to change.

  1. Monday 17 January 2022. Introduction to Speech Recognition
    Slides
    Reading: J&M: chapter 7, section 9.1; R&H review chapter (sec 1).
     
  2. Thursday 20 January 2022. Speech Signal Analysis 1
    Slides (updated 24 January)
    Reading: O'Shaughnessy (2000), Speech Communications: Human and Machine, chapter 2; J&M: Sec 9.3; Paul Taylor (2009), Text-to-Speech Synthesis: Ch 10 and Ch 12.
    SparkNG MATLAB realtime/interactive tools for speech science research and education
     
  3. Monday 24 January 2022. Speech Signal Analysis 2
    Slides (updated 31 January)
    Reading: O'Shaughnessy (2000), Speech Communications: Human and Machine, chapter 3-4
     
  4. Thursday 27 January 2022. Hidden Markov Models
    Slides (updated 31 January)
    Reading: Rabiner & Juang (1986) Tutorial.; J&M: Secs 6.1-6.5, 9.2, 9.4; R&H review chapter (sec 2.1, 2.2);
     
  5. Monday 31 January 2022. Training HMMs
    Slides
    Reading: J&M: Sec 9.7, G&Y review (sections 1, 2.1, 2.2); (J&M: Secs 9.5, 9.6, 9.8 for introduction to decoding).
     
  6. Thursday 3 February 2022. Gaussian mixture models
    Slides
    Reading: R&H review chapter (sec 2.2)
     
  7. Monday 7 February 2022. HMM acoustic modelling 3: Context-dependent phone modelling
    Slides
    Reading: J&M: Sec 10.3; R&H review chapter (sec 2.3); Young (2008).
     
  8. Thursday 10 February 2022. Large vocabulary ASR
    Slides
    Reading: Ortmanns & Ney
     
  9. Monday 14 February 2022. ASR with WFSTs
    Slides
    Reading: Mohri et al (2008), Speech recognition with weighted finite-state transducers, in Springer Handbook of Speech Processing (sections 1 and 2)
     
  10. Thursday 17 February 2022. Neural network acoustic models 1: Introduction
    Slides
    Reading: Jurafsky and Martin (draft 3rd edition), chapter 7 (secs 7.1 - 7.4)
    Background Reading: M Nielsen (2014), Neural networks and deep learning - chapter 1 (introduction), chapter 2 (back-propagation algorithm), chapter 3 (the parts on cross-entropy and softmax).
     
    Monday 21 - Friday 25 February 2022.
    NO LECTURES OR LABS - FLEXIBLE LEARNING WEEK.
     
  11. Monday 28 February 2022. Neural network acoustic models 2: Hybrid HMM/DNN systems
    Slides
    Background Reading: Morgan and Bourlard (May 1995). Continuous speech recognition: Introduction to the hybrid HMM/connectionist approach, IEEE Signal Processing Mag., 12(3):24-42
    Mohamed et al (2012). Understanding how deep belief networks perform acoustic modelling, ICASSP-2012.
     
  12. Thursday 3 March 2022. Neural Networks for Acoustic Modelling 3: DNN architectures
    Slides
    Reading: Maas et al (2017), Building DNN acoustic models for large vocabulary speech recognition Computer Speech and Language, 41:195-213.
    Background reading: Peddinti et al (2015). A time delay neural network architecture for efficient modeling of long temporal contexts, Interspeech-2015
    Graves et al (2013), Hybrid speech recognition with deep bidirectional LSTM, ASRU-2013.
     
  13. Monday 7 March 2022. Speaker Adaptation
    Slides
    Reading: G&Y review, sec. 5
    Woodland (2001), Speaker adaptation for continuous density HMMs: A review, ISCA Workshop on Adaptation Methods for Speech Recognition
    Bell et al (2021), Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview , IEEE Open Journal of Signal Processing, Vol 2:33-36.
     
  14. Thursday 10 March 2022 Discriminative training
    Slides
    Reading: Sec 27.3.1 of Young (2008), HMMs and Related Speech Recognition Technologies.
     
  15. Monday 14 March 2022. Multilingual and low-resource speech recognition
    Slides (updated 14 March)
    Reading: Besaciera et al (2014), Automatic speech recognition for under-resourced languages: A survey, Speech Communication, 56:85--100.
    Huang et al (2013). Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers, ICASSP-2013.
     
  16. Thursday 17 March 2022. End-to-end systems 1: CTC
    Slides
    Reading: A Hannun et al (2014), Deep Speech: Scaling up end-to-end speech recognition, ArXiV:1412.5567.
    A Hannun (2017), Sequence Modeling with CTC, Distill.
    Background Reading: Y Miao et al (2015), EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, ASRU-2105.
    A Maas et al (2015). Lexicon-free conversational speech recognition with neural networks, NAACL HLT 2015.
     
  17. Monday 21 March 2022. End-to-end systems 2: Encoder-decoder models
    Slides
    Reading: W Chan et al (2015), Listen, attend and spell: A neural network for large vocabulary conversational speech recognitionICASSP.
    R Prabhavalkar et al (2017), A Comparison of Sequence-to-Sequence Models for Speech Recognition, Interspeech.
    Background Reading: C-C Chiu et al (2018), State-of-the-art sequence recognition with sequence-to-sequence models, ICASSP.
    S Watanabe et al (2017), Hybrid CTC/Attention Architecture for End-to-End Speech Recognition, IEEE STSP, 11:1240--1252.
     
  18. Thursday 24 March 2022. Guest lecture: Unsupervised raw waveform modelling
    Slides
    Background Reading: A van den Ooord et al (2018), Representation learning with contrastive predictive coding
    S Schneider et al (2019), wav2vec: Unsupervised pre-training for speech recognition, Interspeech.
     
  19. Date to be confirmed. Revision lecture – questions and answers

     

Reading

Textbook (essential)

Review and Tutorial Articles

Other supplementary materials


Copyright (c) University of Edinburgh 2015-2022
The ASR course material is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
licence.txt
This page maintained by Peter Bell.
Last updated: 2022/03/24 12:40:29UTC


Home : Teaching : Courses : Asr 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh