- Abstract:
-
This paper shows that the web can be employed to obtain frequencies for bigrams that are unseen in a given corpus. We describe a method for retrieving counts for adjective-noun, noun-noun, and verb-object bigrams from the web by querying a search engine. We evaluate this method by demonstrating: (a) a high correlation between web frequencies and corpus frequencies; (b) a reliable correlation between web frequencies and plausibility judgments; (c) a reliable correlation between web frequencies and frequencies recreated using class-based smoothing; (d) a good performance of web frequencies in a pseudo-disambiguation task.
- Links To Paper
- 1st link
- Bibtex format
- @Article{EDI-INF-RR-0311,
- author = {
Frank Keller
and Mirella Lapata
},
- title = {Using the Web to Obtain Frequencies for Unseen Bigrams.},
- journal = {Computational Linguistics},
- publisher = {MIT Press},
- year = 2003,
- month = {Sep},
- volume = {29(3)},
- pages = {459-484},
- doi = {10.1162/089120103322711604},
- url = {http://homepages.inf.ed.ac.uk/keller/papers/cl03.pdf},
- }
|