ASR 2017-18  |  News Archive  |  Lectures  |  Labs  |  Coursework  |  Piazza

Automatic Speech Recognition (ASR) 2017-18: Lectures

Lectures will take place on Mondays and Thursdays at 15:10-16:00 in David Hume Tower, room LG.09, starting on Monday 15 January.

  1. Monday 15 January 2018. Introduction to Speech Recognition (Steve)
    Slides; lecture recording; revision log
    Reading: J&M: chapter 7, section 9.1; R&H review chapter (sec 1).
     
  2. Thursday 18 January 2018. Speech Signal Analysis 1 (Hiroshi)
    Slides ; lecture recording;
    SparkNG: MATLAB realtime speech tools and voice production tools
    Reading: J&M: Sec 9.3
     
  3. Monday 22 January 2018. Speech Signal Analysis 2 (Hiroshi)
    lecture recording; Reading: Hermansky (1990), PLP analysis of speech .
     
  4. Thursday 25 January 2018. Acoustic modelling: HMMs and GMMs 1 (Hiroshi)
    Slides; lecture recording; Reading: J&M: Secs 6.1-6.5, 9.2, 9.4; R&H review chapter (sec 2.1, 2.2); Rabiner & Juang (1986) Tutorial.
     
  5. Monday 29 January 2018. Acoustic modelling: HMMs and GMMs 2 (Hiroshi)
    lecture recording; Reading: G&Y review (sections 1, 2.1, 2.2).
     
  6. Thursday 1 February 2018. Acoustic modelling: Context-dependent phone modelling (Hiroshi)
    Slides; lecture recording; Reading: R&H review chapter (sec 2.3);Young (2008).
     
  7. Monday 5 February 2018. Introduction to neural networks (Steve)
    Slides; lecture recording (apologies for the poor audio quality); revision log
    Background Reading: M Nielsen (2014), Neural networks and deep learning - chapter 1 (introduction), chapter 2 (back-propagation algorithm), chapter 3 (the parts on cross-entropy and softmax).
     
  8. Thursday 8 February 2018. Neural network acoustic models 1 (Steve)
    Slides; lecture recording; revision log
    Reading: Morgan and Bourlard (1995), Continuous speech recognition: Introduction to the hybrid HMM/connectionist approach.
     
  9. Monday 12 February 2018. Neural network acoustic models 2 (Steve)
    Slides; lecture recording; revision log
    Reading: Hinton et al (2012), Deep neural networks for acoustic modeling in speech recognition.
     
  10. Thursday 15 February 2018. Lexicon and language model (Steve)
    Slides; lecture recording; revision log.
    Reading: J&M: Chapter 4
     
    Monday 19 - Friday 23 February 2018.
    NO LECTURES OR LABS - FLEXIBLE LEARNING WEEK.
     
    Monday 26 February 2018 -- Monday 19 March 2018.
    LECTURES CANCELLED OWING TO THE UCU STRIKE AND WEATHER.
     
  11. Thursday 22 March 2018 Speaker Adaptation (Steve)
    Slides; lecture recording; revision log.
    Reading: G&Y review, sec. 5
    Woodland (2001), Speaker adaptation for continuous density HMMs: A review
    Swietojanski et al (2016), Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation
     
  12. Monday 26 March 2018. Decoding, Alignment, and WFSTs (Steve)
    Slides; lecture recording; revision log.
    Reading: Mohri et al (2008), Speech recognition with weighted finite-state transducers, in Springer Handbook of Speech Processing (sections 1 and 2)
    Bell and Renals (2015), A system for automatic alignment of broadcast media captions using weighted finite-state transducers, ASRU.
     
  13. Thursday 29 March 2018. Multilingual speech recognition (Steve)
    Slides; lecture recording; revision log.
    Reading: Besaciera et al (2014), Automatic speech recognition for under-resourced languages: A survey, Speech Communication, 56:85--100.
    Huang et al (2013). Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers, ICASSP.
     
  14. Monday 2 April 2018. Sequence discriminative training (Steve)
    Slides; lecture recording; revision log.
    Reading: Sec 27.3.1 of Young (2008), HMMs and Related Speech Recognition Technologies.
    K Vesely et al (2013), Sequence-discriminative training of deep neural networks, Interspeech-2013.
     
  15. Thursday 5 April 2018. CTC and end-to-end systems (Steve)
    Slides; lecture recording; revision log.
    Reading: A Hannun et al (2014), Deep Speech: Scaling up end-to-end speech recognition, ArXiV:1412.5567.
    A Hannun (2017), Sequence Modeling with CTC, Distill.
     
  16. Friday 4 May 2018. Revision lecture - questions and answers (Steve)
    lecture recording (audio only, no slides).
     

Reading

Textbook

Review and Tutorial Articles


Copyright (c) University of Edinburgh 2015-2018
The ASR course material is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License
licence.txt
This page maintained by Steve Renals.
Last updated: 2018/05/05 07:29:50UTC


Home : Teaching : Courses : Asr 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh