Informatics Report Series


Report   

EDI-INF-RR-0311


Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home
Title:Using the Web to Obtain Frequencies for Unseen Bigrams.
Authors: Frank Keller ; Mirella Lapata
Date:Sep 2003
Publication Title:Computational Linguistics
Publisher:MIT Press
Publication Type:Journal Article Publication Status:Published
Volume No:29(3) Page Nos:459-484
DOI:10.1162/089120103322711604 ISBN/ISSN:0891-2017
Abstract:
This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between web frequencies and corpus frequencies; (b) a reliable correlation between web frequencies and plausibility judgments; (c) a reliable correlation between web frequencies and frequencies recreated using class-based smoothing; (d) a good performance of web frequencies in a pseudo-disambiguation task.
Links To Paper
1st link
Bibtex format
@Article{EDI-INF-RR-0311,
author = { Frank Keller and Mirella Lapata },
title = {Using the Web to Obtain Frequencies for Unseen Bigrams.},
journal = {Computational Linguistics},
publisher = {MIT Press},
year = 2003,
month = {Sep},
volume = {29(3)},
pages = {459-484},
doi = {10.1162/089120103322711604},
url = {http://homepages.inf.ed.ac.uk/keller/papers/cl03.pdf},
}


Home : Publications : Report 

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh