Informatics Report Series



Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Title:Randomised Language Modelling for Statistical Machine Translation
Authors: David Talbot ; Miles Osborne
Date: 2007
Publication Title:ACL 07
Publisher:Association for Computational Linguistics
Publication Type:Conference Paper Publication Status:Published
A Bloom filter (BF) is a randomized data structure for set membership queries. Its space requirements are significantly below lossless information-theoretic lower bounds but it produces false positives with some constant probability. Here we explore the use of BFs for language modelling in statistical machine translation. We investigate how a BF containing n-grams extracted from a large corpus can complement a standard n-gram LM within an SMT system and consider (i) how to include approximate frequency information efficiently and (ii) how to reduce the effective error rate by first checking for lower-order subsequences in candidate n-grams. Our solutions in both cases retain the one-sided error guarantees of the standard BF while taking advantage of the particular characteristics of natural language statistics to reduce the space requirements.
Links To Paper
1st Link
Bibtex format
author = { David Talbot and Miles Osborne },
title = {Randomised Language Modelling for Statistical Machine Translation},
book title = {ACL 07},
publisher = {Association for Computational Linguistics},
year = 2007,
url = {},

Home : Publications : Report 

Please mail <> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh