I will first describe the nature of the problem in some detail, specifically how one goes about converting a noisy continuous representation arriving at 300,000 bits/second (a speech waveform) into a clean discrete representation at 120 bits/second (a string of words). Next, a description of state of the art recognition technology will be presented. This will describe how techniques such as Hidden Markov Models (HMMs) and ngram language models work in a speech recognition context. This part of the talk will describe the learning algorithms used to train such systems and the decoding strategies used to actually perform recognition. I will also give an overview at this point on where we actually stand today in terms of performance, and explain why it's possible to have a dictation system on your PC but still not have a system that can recognise simple instructions over a phone line.
The final, and perhaps most important, part will focus on where we go from here - what problems still need be be addressed and the received wisdom on how to handle them. I will describe the many criticisms aimed at HMMs and what people are proposing as new solutions. I will conclude with an overview of how I think many of the types of work being done in Informatics can contribute to specific parts of the speech recognition problem.
|
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |