Informatics Report Series


Report   

EDI-INF-RR-0720


Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home
Title:Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora
Authors: Christopher Callison-Burch ; David Talbot ; Miles Osborne
Date: 2004
Publication Title:Proceedings of ACL 2004 (Meeting of the Association for Computational Linguistics)
Publication Type:Conference Paper Publication Status:Published
Page Nos:175-182
Abstract:
The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training. Incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38\% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discu ss how varying the ratio of word-aligned to sentence-aligned data affects the expected perfo rmance gain.
Copyright:
2006 by The University of Edinburgh. All Rights Reserved
Links To Paper
No links available
Bibtex format
@InProceedings{EDI-INF-RR-0720,
author = { Christopher Callison-Burch and David Talbot and Miles Osborne },
title = {Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora},
book title = {Proceedings of ACL 2004 (Meeting of the Association for Computational Linguistics)},
year = 2004,
pages = {175-182},
}


Home : Publications : Report 

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh