Informatics Report Series
|
|
|
|
|
|
Title:Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora |
Authors:
Christopher Callison-Burch
; David Talbot
; Miles Osborne
|
Date: 2004 |
Publication Title:Proceedings of ACL 2004 (Meeting of the Association for Computational Linguistics) |
Publication Type:Conference Paper
Publication Status:Published
|
Page Nos:175-182
|
|
- Abstract:
-
The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training.
Incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38\% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discu ss how varying the ratio of word-aligned to sentence-aligned data affects the expected perfo rmance gain.
- Copyright:
- 2006 by The University of Edinburgh. All Rights Reserved
- Links To Paper
- No links available
- Bibtex format
- @InProceedings{EDI-INF-RR-0720,
- author = {
Christopher Callison-Burch
and David Talbot
and Miles Osborne
},
- title = {Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora},
- book title = {Proceedings of ACL 2004 (Meeting of the Association for Computational Linguistics)},
- year = 2004,
- pages = {175-182},
- }
|