Informatics Report Series


Report   

EDI-INF-RR-1020


Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home
Title:Randomised Language Modelling for Statistical Machine Translation
Authors: David Talbot ; Miles Osborne
Date: 2007
Publication Title:ACL 07
Publisher:Association for Computational Linguistics
Publication Type:Conference Paper Publication Status:Published
Abstract:
A Bloom filter (BF) is a randomized data structure for set membership queries. Its space requirements are significantly below lossless information-theoretic lower bounds but it produces false positives with some constant probability. Here we explore the use of BFs for language modelling in statistical machine translation. We investigate how a BF containing n-grams extracted from a large corpus can complement a standard n-gram LM within an SMT system and consider (i) how to include approximate frequency information efficiently and (ii) how to reduce the effective error rate by first checking for lower-order subsequences in candidate n-grams. Our solutions in both cases retain the one-sided error guarantees of the standard BF while taking advantage of the particular characteristics of natural language statistics to reduce the space requirements.
Links To Paper
1st Link
Bibtex format
@InProceedings{EDI-INF-RR-1020,
author = { David Talbot and Miles Osborne },
title = {Randomised Language Modelling for Statistical Machine Translation},
book title = {ACL 07},
publisher = {Association for Computational Linguistics},
year = 2007,
url = {http://acl.ldc.upenn.edu/P/P07/P07-1065.pdf},
}


Home : Publications : Report 

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh