Paul Taylor

ICCS, Informatics, and CSTR, Linguistics, University of Edinburgh

Speech Recognition

I will aim at giving a general audience an in-depth insight into the issues confronting speech recognition today.

I will first describe the nature of the problem in some detail, specifically how one goes about converting a noisy continuous representation arriving at 300,000 bits/second (a speech waveform) into a clean discrete representation at 120 bits/second (a string of words). Next, a description of state of the art recognition technology will be presented. This will describe how techniques such as Hidden Markov Models (HMMs) and ngram language models work in a speech recognition context. This part of the talk will describe the learning algorithms used to train such systems and the decoding strategies used to actually perform recognition. I will also give an overview at this point on where we actually stand today in terms of performance, and explain why it's possible to have a dictation system on your PC but still not have a system that can recognise simple instructions over a phone line.

The final, and perhaps most important, part will focus on where we go from here - what problems still need be be addressed and the received wisdom on how to handle them. I will describe the many criticisms aimed at HMMs and what people are proposing as new solutions. I will conclude with an overview of how I think many of the types of work being done in Informatics can contribute to specific parts of the speech recognition problem.


Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh