- Abstract:
- In spoken dialogue systems, it is important for the system to know how likely a speech recognition hypothesis is to be correct, so it can reject misrecognized user turns, or, in cases where many errors have occurred, change its interaction strategy or switch the caller to a human attendant. We have identified prosodic features which predict more accurately when a recognition hypothesis contains errors than the acoustic confidence scores traditionally used in automatic speech recogntition in spoken dialogue systems. We describe statistical comparisons of features of correctly and incorrectly recognized turns in the TOOT train information corpus and the W99 conference registration corpus, which reveal significant prosodic differences between the two sets of turns. We then present machine learning results showing that the use of prosodic features, alone and in combination with other automatically available features, can predict more accurately whether or not a user turn was correctly recognized, when cmopared to the use of acoustic confidence scores alone.
- Links To Paper
- No links available
- Bibtex format
- @Article{EDI-INF-RR-1106,
- author = {
Julia Hirschberg
and Diane Litman
and Marc Swerts
},
- title = {Prosodic and Other Cues to Speech Recognition Failures},
- journal = {Speech Communication},
- publisher = {Elsevier},
- year = 2004,
- month = {Jun},
- volume = {43},
- pages = {155-175},
- doi = {10.1016/j.specom.2004.01.006},
- }
|