Informatics Report Series


Report   

EDI-INF-RR-1203


Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home
Title:Interpolating between Types and Tokens by Estimating Power-Law Generators
Authors: Sharon Goldwater ; T.L. Griffiths ; Mark Johnson
Date: 2006
Publication Title:Advances in Neural Information Processing Systems 18
Publication Type:Conference Paper Publication Status:Published
Abstract:
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-laws distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
Links To Paper
1st link
Bibtex format
@InProceedings{EDI-INF-RR-1203,
author = { Sharon Goldwater and T.L. Griffiths and Mark Johnson },
title = {Interpolating between Types and Tokens by Estimating Power-Law Generators},
book title = {Advances in Neural Information Processing Systems 18},
year = 2006,
url = {http://books.nips.cc/papers/files/nips18/NIPS2005_0333.pdf},
}


Home : Publications : Report 

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh