References for ASR
- Daniel Jurafsky and James H. Martin (2008). Speech and Language Processing, Pearson Education (2nd edition). (Errata) [chapters 6, 9, 10]
Wikipedia coverage of most ASR topics is very poor. However the following entries on same basic pattern recognition and density estimation topics are OK:
Review and Tutorial Articles
- S Renals and T Hain (2010). Speech Recognition, to appear in Computational Linguistics and Natural Language Processing Handbook, A Clark, C Fox and S Lappin (eds.), Blackwells.
- MJF Gales and SJ Young (2007). The Application of Hidden Markov Models in Speech Recognition, Foundations and Trends in Signal Processing, 1 (3), 195-304.
- S Young (1996). A review of large-vocabulary continuous-speech recognition, IEEE Signal Processing Magazine 13 (5), 45--57.
- J-L Gauvain and L Lamel (2000). Large-vocabulary continuous speech recognition: advances and applications, Proceedings of the IEEE, 88 (8), 1181-1200.
- PC Woodland (2002). The development of the HTK Broadcast News transcription system: An overview, Speech Communication, 37(1--2), 47-67.
- S Young (2008). HMMs and Related Speech Recognition Technologies, in Springer Handbook of Speech Processing, J Benesty, MM Sondhi and Y Huang (eds), chapter 27, 539--557.
- L Rabiner and B Juang (1986),
An introduction to hidden Markov models
IEEE ASSP Magazine, 3 (1), 4--16.
- JR Bellegarda and D Nahamoo (1990). Tied mixture continuous parameter modeling for speech recognition, IEEE Trans ASSP, 38 (12), 2033-2045.
- XD Huang (1992). Phoneme classification using semicontinuous hidden Markov models , IEEE Trans Signal Processing, 40 (5), 1062-1067.
- SJ Young and PC Woodland (1994). State clustering in hidden Markov model-based continuous speech recognition, Computer Speech and Language ,4, 369-383.
Context-dependent phone models
- R. Schwartz, Y. Chow, O. Kimball, S. Roucos, M. Krasner and J. Makhoul (1985). Context-dependent modeling for acoustic-phonetic recognition of continuous speech, Proc IEEE ICASSP-85, 1205-1208.
- K-F Lee (1990). Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition, IEEE Trans on ASSP, 38(4), 599-609.
- LR Bahl, PV de Souza, PS Gopalakrishnan, D Nahamoo and MA Picheny (1991). Context Dependent Modeling of Phones in Continuous Speech Using Decision Trees, Proc DARPA Speech and Natural Language Processing Workshop, 264-270.
- S Young, J Odell and P Woodland (1994). Tree-based state tying for high accuracy acoustic modelling, Proc HLT Workshop, 307-312.
- P Woodland (2001). Speaker adaptation for continuous density HMMs: A review, Proceedings of the ISCA workshop on adaptation methods for speech recognition, 11-19.
- M Gales and P Woodland (1996). Mean and variance adaptation within the MLLR framework, Computer Speech and Language, 10:249-264.
- M Gales (1998). Maximum likelihood linear transformations for HMM-based speech recognition , Computer Speech and Language, 12:75-98.
- M Gales (2000). Cluster adaptive training of hidden Markov models, IEEE Trans Speech and Audio Processing, 8:417-428.
- R Kuhn, JC Junqua, P Nguyen and N Niedzielski (2000). Rapid speaker adaptation in eigenvoice space, IEEE Trans Speech and Audio Processing, 8:695-707
- G Garau, S Renals and T Hain (2005). Applying vocal tract length normalization to meeting recordings, Proc Interspeech'05
Large Vocabulary Systems
- A Nadas (1983). A decision theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood, Proc IEEE Trans ASSP, 31(4):814-817.
- L Bahl, P Brown, P de Souza and R Mercer (1986). Maximum mutual information estimation of hidden Markov model parameters for speech recognition, Proc IEEE ICASSP '86
- Y Normandin and SD Morgera (1992). An improved MMIE training algorithm for speaker-independent, small vocabulary, continuous speech recognition, Proc IEEE ICASSP '92
- PC Woodland and D Povey (2002). Large scale discriminative training of hidden Markov models for speech recognition, Computer Speech and Language, 16(1):25-47.
- D. Povey (2003). Discriminative Training for Large Vocabulary Speech Recognition, PhD thesis, University of Cambridge.
- J Droppo and A Acero (2008). Environmental Robustness, in Springer Handbook of Speech Processing, J Benesty, MM Sondhi and Y Huang (eds), chapter 33, 653--680.
(Deep) neural networks
- N Morgan and H Bourlard (May 1995). Continuous speech recognition: An introduction to the hybrid HMM/connectionist approach, IEEE Signal Processing Magazine, 12(3), 24-42.
- N Morgan et al (Sep 2005). Pushing the envelope - aside, IEEE Signal Processing Magazine, 22(5), 81-88. %
- F Grezl and P Fousek (2008). Optimizing bottleneck features for LVCSR, Proc ICASSP-2008.
- G Hinton et al (Nov 2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 29(6), 82--97.
Transcribing TED data
Introductory texts (now getting rather old)
Statistical Methods in Computational Linguistics
by Mark Gawron @ San Diego State Univ.
SRILM Manual Pages
Good-Turing Smoothing Without Tears
by William A. Gale @ ATT Bell Lab, 1994.
A Survey of Smoothing Techniques for ME Models
by Stanley F. Chen, Ronald Rosenfeld,
IEEE Trans SAP, Vol.8, No.1, January 2000.
A study of smoothing methods for language models applied to
by Chengxiang Zhai and John Lafferty @ CMU, ACM Trans on Information
Systems, Vol. 22, Issue 2, pp.179-214, April 2004.
Decoding / Search
Speaker Recognition by Sadaoki Furui
Survey of the State of the Art in Human Language Technology (1996)
NIST Speaker Recognition Evaluation Chronicles
by Mark Przybocki and Alvin Martin, 2004.
Automatci Speaker Recognition - Recent progress, Current
applications, and Future trends
by D. A. Raynolds and L. P. Heck, at AASS2000
Speaker Recognition, a tutorial
by J.C. Campbell, 1997, Proceedings of IEEE, vol.85, No.9,
pp.1437-1562, 1997 September.
Odyssey - Speaker and Language Characterization Interest Group
Audio Signal Processing
Prosody modeling for automatic speech recognition and modeling
by E Shriberg and A Stolcke,
Mathematical Foundations of Speech and Language Processing, 2004.
Audio-visual Automatic Speech Recognition: An Overview
by G. Potamianos, C. Neti, J. Luettin, and I. Matthews,
In: Issues in Visual and Audio-Visual Speech Processing, G. Bailly,
E. Vatikiotis-Bateson, and P. Perrier (Eds.),
MIT Press (In Press), 2004.
|Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
Please contact our webadmin with
any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright ©
The University of Edinburgh