Title:Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora
Authors: Christopher Callison-Burch ; David Talbot ; Miles Osborne
Date: 2004
Publication Title:Proceedings of ACL 2004 (Meeting of the Association for Computational Linguistics)
Publication Type:Conference Paper Publication Status:Published
Page Nos:175-182
The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training. Incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38\% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discu ss how varying the ratio of word-aligned to sentence-aligned data affects the expected perfo rmance gain.
