Automatic Speech Recognition (ASR): 2013/14

[course descriptor]

Lecturers

News

The first lecture will be on Monday, 13th January 2014, 15:10-16:00 in Forrest Hill room D.02
Speech Processing is a recommended pre-requisite course; however if you have not taken Speech Processing, but have taken Introductory Applied Machine Learning (IAML), or Machine Learning and Pattern Recognition (MLPR) then it is possible to do this course (although you will be expected to self-study some material from Speech Processing). Machine Translation (MT) is a somewhat related second semester courses.
This year's course web pages are under construction; you can have a look at last year's course web page - but there will be some changes to the syllabus this year to take account of new developments in the field.

Reading

Textbook

J&M: Daniel Jurafsky and James H. Martin (2008). Speech and Language Processing, Pearson Education (2nd edition). (Errata (for 1st and 2nd printings of 2nd Edition)) [chapters 4, 6, 9, 10]
Taylor:P Taylor (2009), Text-to-Speech Synthesis, Cambridge University Press. Good coverage of speech signal processing.
C. Manning and H. Schutze (1999). Foundations of Statistical Language Processing, MIT Press. Useful for language modelling

Review and Tutorial Articles

R&H:S Renals and T Hain (2010). Speech Recognition, in Computational Linguistics and Natural Language Processing Handbook, A Clark, C Fox and S Lappin (eds.), Blackwells, chapter 12, 299-332.
G&Y: MJF Gales and SJ Young (2007). The Application of Hidden Markov Models in Speech Recognition, Foundations and Trends in Signal Processing, 1 (3), 195-304.
S Young (1996). A review of large-vocabulary continuous-speech recognition, IEEE Signal Processing Magazine 13 (5), 45-57.
J-L Gauvain and L Lamel (2000). Large-vocabulary continuous speech recognition: advances and applications, Proceedings of the IEEE, 88 (8), 1181-1200.
PC Woodland (2002). The development of the HTK Broadcast News transcription system: An overview, Speech Communication, 37(1-2), 47-67.
S Young (2008). HMMs and Related Speech Recognition Technologies, in Springer Handbook of Speech Processing, J Benesty, MM Sondhi and Y Huang (eds), chapter 27, 539-557.

Syllabus 2013/14

Lecture No.	Date	Week	Lecturer	Topic and slides	Reading
1	Mon 13 January	1	Renals	Introduction to Speech Recognition (slides)	J&M: chapter 7, chapter 9 (9.1 - 9.3) R&H review chapter
2	Thu 16 January	1	Shimodaira	Speech Signal Analysis 1 (slides)	J&M: Sec 9.3 Taylor, chapters 10, 12
3	Mon 20 January	2	Shimodaira	Speech Signal Analysis 2	Hermansky (1990), PLP analysis of speech
4	Thu 23 January	2	Shimodaira	Acoustic modelling basics: HMMs and GMMs 1 (slides-4up,slides)	J&M: Secs 6.1-6.5, 9.2, 9.4 G&Y review R&H review chapter Rabiner & Juang (1986) Tutorial
5	Mon 27 January	3	Shimodaira	Acoustic modelling basics: HMMs and GMMs 2
6	Thu 30 January	3	Renals	Context-dependent phone modelling with HMMs 1 (slides)	Young (2008) Lee (1990) Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition
7	Mon 3 February	4	Renals	Context-dependent phone modelling with HMMs 2	Young & Woodland (1994) State clustering in hidden Markov model-based continuous speech recognition Young et al (1994). Tree-based state tying for high accuracy acoustic modelling,
	Thu 6 February	4	Shimodaira	Introduction to Assignment 1	Assignment 1: continuous speech recognition
	Thu 6 February	4		Lab session (17:00)
8	Mon 10 February	5	Renals	Lexicon and language model (slides)	J&M, Ch 4 Manning & Schutze, Ch 6
	Mon 10 February	5		Lab session (17:00)
9	Thu 13 February	5	Shimodaira	Search and decoding (slides)	Aubert (2002) An overview of decoding techniques for large vocabulary continuous speech recognition
	Thu 13 February	5		Lab session (17:00)
	Mon 17 February	6		No Lecture - Innovative Learning Week
	Thu 20 February	6		No Lecture - Innovative Learning Week
10	Mon 24 February	7	Renals	Intro to neural networks (slides)	Multi-layer neural networks Morgan & Bourlard (1995), Continuous speech recognition: An introduction to the hybrid HMM/connectionist approach
	Mon 24 February	7		Lab session (17:30)
	Wed 26 February	7		Assignment 1 Deadline (16:00)
11	Thu 27 February	7	Renals	(Deep) neural network acoustic models (slides)	Hinton et al (2012), Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
	Mon 3 March	8	Renals	Introduction to Assignment 2	Assignment 2: literature review
12	Thu 6 March	8	Renals	Neural network language models (slides)	Bengio et al (2006), Neural probabilistic language models(Secs 6.1, 6.2, 6.3, 6.7, 6.8) Mikolov et al (2011), Extensions of recurrent neural network language model
13	Mon 10 March	9	Renals	Speaker adaptation 1 (slides)	G&Y review, sec. 5 Woodland (2001), Speaker adaptation for continuous density HMMs: A review
14	Thu 13 March	9	Renals	Speaker adaptation 2
15	Mon 17 March	10	Renals	Discriminative training of GMM-based systems (slides)	Young (2008), sec 27.3.1
	Wed 19 March	10		Assignment 2 Deadline (16:00)
16	Thu 20 March	10		Case study: transcribing TED talks (slides)

Schedule

Lecture - Mondays, 15:10-16:00: Forrest Hill room D.02
Lecture - Thursdays, 15:10-16:00: Forrest Hill room D.02
Lab sessions - 6/10/13/24 February, 17:00

Closer to the exam we are very happy to arrange a revision lecture at a time convenient to everyone. The point of this lecture will be to answer and discuss any questions about the course.

Coursework

There are two pieces of coursework.

Assignment 1: continuous speech recognition - monophone and triphone models. The coursework will involve training and testing a continuous speech recognition system using the HTK software. We'll use the WSJCAM0 database (British English recordings of speakers reading the Wall Street Journal sentences).
Released: Monday 3 February 2014
Deadline: Wednesday 26 February 2014, 16:00
Feedback: Wednesday 12 March 2014
Report templates:
- Latex: asr_latex.zip
- Word: asr_word.zip
Q and A
Assignment 2: literature review. The key papers are:
- Deep neural network acoustic models
  - Morgan and Bourlard (1995), Continuous speech recognition: An introduction to the hybrid HMM/connectionist approach, IEEE Signal Processing Mag., 12(3):24-42.
  - Hinton et al (2012), Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Mag., 29(6):82-97.
- How can human speech recognition inform automatic speech recognition?
  - Hermansky (1998), Should Recognizers Have Ears?, Speech Communication,, 25, 3-27.
  - Kitaoka, Enami, and Nakagawa (2014), Effect of acoustic and linguistic contexts on human and machine speech recognition, Computer Speech and Language, 28, 769-787.
Released: Monday 3 March 2014
Deadline: Wednesday 19 March 2014, 16:00
Feedback: Wednesday 2 April 2014
Report templates:
- Latex: asr_latex.zip
- Word: asr_word.zip

Home : Teaching : Courses : Asr