Automatic Speech Recognition (ASR): 2012/13
[course descriptor]
Lecturer
News
- Please fill out the course survey
- Peter Bell will be our guest speaker on the 'TED Talks' ASR system on Thursday 21 March
- No lecture on Thursday 14 March
- Lab sessions at 5pm on Mondays in AT-5.04
- Dr Peter Bell will give the lectures on Monday 28 January and Thursday 31 January.
- New locations: LT5, Bristo Square (7)(Mondays); LR1 Minto House (Thursdays)
- No lecture on Monday 21 January
- The first lecture will be on Monday, 14th January 2013, 15:00-15:50 in Forrest Hill room C.27
- Speech Processing is a recommended pre-requisite course; however if you have not taken Speech Processing, but have taken Introductory Applied Machine Learning (IAML), or
Probabilistic Modelling and Reasoning (PMR) then it is possible to do this course (although you will be expected to self-study some material from Speech Processing). Machine Learning and Pattern Recognition (MLPR) and Machine Translation (MT) are related second semester courses, along with the PPLS course Speech Synthesis.
- This year's course web pages are under construction; you can have a look at last year's course web page - but there will be some changes to the syllabus this year to take account of new developments in the field.
Syllabus
- Introduction to Speech Recognition (slides) [Lecture 1: 14 January 2013]
- Speech signal analysis (slides) [Lectures 2, 3: 18, 25 January 2013]
- Acoustic modelling basics: HMMs and GMMs (slides) [Lectures 4, 5: 28, 31 January 2013]
- Context-dependent phone modelling with HMMs (slides) [Lectures 6, 7: 4, 7 February 2013]
- Lexicon and language model (slides) [Lecture 8: 11 February 2013]
- Search and decoding (slides) [Lecture 9: 14 February 2013]
- Speaker adaptation (slides) [Lectures 10, 11: 25, 28 February 2013]
- Robustness to the acoustic environment (slides) [Lecture 12: 4 March 2013]
- Discriminative training of GMM-based systems (slides) [Lecture 13: 11 March 2013]
- (Deep) neural networks (slides) [Lectures 14, 15: 11, 18 March 2013]
- Case study: transcribing TED talks (slides) [Lecture 16: 21 March 2013]
Readings: Useful texts
Schedule
- Lecture - Mondays, 15:10-16:00: Lecture Theatre 5, Bristo Square (7)
- Lecture - Thursdays, 15:10-16:00: LR1 Minto House, Chambers Street.
- Lab - Mondays, 17:10-18:00: AT-4.05.
Closer to the exam I am very happy to arrange a revision lecture at a time convenient to everyone. The point of this lecture will be to answer and discuss any questions about the course.
Coursework
The coursework will involve training and testing a continuous speech recognition system using the HTK software. We'll use the WSJCAM0 database (British English recordings of speakers reading the Wall Street Journal sentences). It will come in two parts.
- Coursework part 1: continuous speech recognition - monophone and triphone models
- Coursework part 2: continuous speech recognition - speaker adaptation
- The submission deadline is Wednesday 27 March at 16:00
- Report templates
- Feedback from the coursework will be available by 5 April
- There will be some lab sessions related to the coursework