ASR 2022-23  |  News Archive  |  Lectures  |  Labs  |  Coursework  |  Piazza

Automatic Speech Recognition (ASR) 2022-23: Lectures

Lectures take place on Mondays and Thursdays at 14:10, starting Monday 16 January. Monday lectures are held in the SCS Newhaven Lecture Theatre at 13-15 South College St, and Thursday lectures are held in the HRB Lecture Theatre in the Hugh Robson Building on George Square. Future lecture topics are subject to change.

Lecture live streaming is available via Media Hopper Replay for students not able to attend in person – the link can be found on Learn under “Course Materials”.

  1. Monday 16 January 2023. Introduction to Speech Recognition
    Slides
    Reading: J&M: chapter 7, section 9.1; R&H review chapter (sec 1).
     
  2. Thursday 19 January 2023. Speech Signal Analysis 1
    Slides (updated 11 May; errata)
    Reading: O'Shaughnessy (2000), Speech Communications: Human and Machine, chapter 2; J&M: Sec 9.3; Paul Taylor (2009), Text-to-Speech Synthesis: Ch 10 and Ch 12.
    SparkNG MATLAB realtime/interactive tools for speech science research and education
     
  3. Monday 23 January 2023. Speech Signal Analysis 2
    Slides
    Reading: O'Shaughnessy (2000), Speech Communications: Human and Machine, chapter 3-4
     
  4. Thursday 26 January 2023. Introdcution to Hidden Markov Models
    Slides(updated 27 Jan)
    Reading: Rabiner & Juang (1986) Tutorial.; J&M: Secs 6.1-6.5, 9.2, 9.4; R&H review chapter (sec 2.1, 2.2);
     
  5. Monday 30 January 2023. HMM algorithms
    Slides (updated 30 Jan) and introduction to the labs
    Reading: J&M: Sec 9.7, G&Y review (sections 1, 2.1, 2.2); (J&M: Secs 9.5, 9.6, 9.8 for introduction to decoding).
     
  6. Thursday 2 February 2023. Gaussian mixture models
    Slides
    Reading: R&H review chapter (sec 2.2)
     
  7. Monday 6 February 2023. HMM acoustic modelling 3: Context-dependent phone modelling
    Slides
    Reading: J&M: Sec 10.3; R&H review chapter (sec 2.3); Young (2008).
     
  8. Thursday 9 February 2023. Large vocabulary ASR
    Slides (updated 10 Feb)
    Reading: Ortmanns & Ney
     
  9. Monday 13 February 2023. ASR with WFSTs
    Slides
    Reading: Mohri et al (2008), Speech recognition with weighted finite-state transducers, in Springer Handbook of Speech Processing (sections 1 and 2)
     
  10. Thursday 16 February 2023. Neural network acoustic models 1: Introduction
    Slides (updated 12 Mar; errata)
    Reading: Jurafsky and Martin (draft 3rd edition), chapter 7 (secs 7.1 - 7.4)
    Background Reading: M Nielsen (2014), Neural networks and deep learning - chapter 1 (introduction), chapter 2 (back-propagation algorithm), chapter 3 (the parts on cross-entropy and softmax).
     
    Monday 20 - Friday 24 February 2023.
    NO LECTURES OR LABS - FLEXIBLE LEARNING WEEK.
     
  11. Monday 27 February 2023. Neural network acoustic models 2: Hybrid HMM/DNN systems
    Slides (updated 1 March)
    Background Reading: Morgan and Bourlard (May 1995). Continuous speech recognition: Introduction to the hybrid HMM/connectionist approach, IEEE Signal Processing Mag., 12(3):24-42
    Mohamed et al (2012). Understanding how deep belief networks perform acoustic modelling, ICASSP-2012.
     
  12. Thursday 2 March 2023. Neural Networks for Acoustic Modelling 3: DNN architectures
    Slides
    Reading: Maas et al (2017), Building DNN acoustic models for large vocabulary speech recognition Computer Speech and Language, 41:195-213.
    Background reading: Peddinti et al (2015). A time delay neural network architecture for efficient modeling of long temporal contexts, Interspeech-2015
    Graves et al (2013), Hybrid speech recognition with deep bidirectional LSTM, ASRU-2013.
     
  13. Monday 6 March 2023. Speaker Adaptation
    Slides
    Reading: G&Y review, sec. 5
    Woodland (2001), Speaker adaptation for continuous density HMMs: A review, ISCA Workshop on Adaptation Methods for Speech Recognition
    Bell et al (2021), Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview , IEEE Open Journal of Signal Processing, Vol 2:33-36.
     
  14. Thursday 9 March 2023 Discriminative training
    Slides
    Reading: Sec 27.3.1 of Young (2008), HMMs and Related Speech Recognition Technologies.
     
  15. Monday 13 March 2023. Multilingual and low-resource speech recognition
    Slides
    Background reading: Besaciera et al (2014), Automatic speech recognition for under-resourced languages: A survey, Speech Communication, 56:85--100.
    Huang et al (2013). Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers, ICASSP-2013.
     
  16. Thursday 16 March 2023. End-to-end systems 1: CTC
    Slides
    Reading: A Hannun et al (2014), Deep Speech: Scaling up end-to-end speech recognition, ArXiV:1412.5567.
    A Hannun (2017), Sequence Modeling with CTC, Distill.
    Background Reading: Y Miao et al (2015), EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, ASRU-2105.
    A Maas et al (2015). Lexicon-free conversational speech recognition with neural networks, NAACL HLT 2015.
     
  17. Monday 20 March 2023. End-to-end systems 2: Encoder-decoder models
    Slides (updated 20 Mar; errata)
    Reading: W Chan et al (2015), Listen, attend and spell: A neural network for large vocabulary conversational speech recognitionICASSP.
    R Prabhavalkar et al (2017), A Comparison of Sequence-to-Sequence Models for Speech Recognition, Interspeech.
    Background Reading: C-C Chiu et al (2018), State-of-the-art sequence recognition with sequence-to-sequence models, ICASSP.
    S Watanabe et al (2017), Hybrid CTC/Attention Architecture for End-to-End Speech Recognition, IEEE STSP, 11:1240--1252.
     
  18. Thursday 23 March 2023. Guest lecture: Unsupervised raw waveform modelling
    Slides (minor updates on 23 March)
    Background Reading: A van den Ooord et al (2018), Representation learning with contrastive predictive coding
    S Schneider et al (2019), wav2vec: Unsupervised pre-training for speech recognition, Interspeech.
     
  19. Date to be confirmed. Revision lecture – questions and answers

     

Reading

Textbook (essential)

Review and Tutorial Articles

Other supplementary materials


Copyright (c) University of Edinburgh 2015-2023
The ASR course material is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
licence.txt
This page maintained by Peter Bell.
Last updated: 2023/05/11 11:39:49UTC


Home : Teaching : Courses : Asr 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh