Title:Interpolating between Types and Tokens by Estimating Power-Law Generators
Authors: Sharon Goldwater ; T.L. Griffiths ; Mark Johnson
Date: 2006
Publication Title:Advances in Neural Information Processing Systems 18
Publication Type:Conference Paper Publication Status:Published
Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-laws distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
Bibtex format
author = { Sharon Goldwater and T.L. Griffiths and Mark Johnson },
title = {Interpolating between Types and Tokens by Estimating Power-Law Generators},
book title = {Advances in Neural Information Processing Systems 18},
year = 2006,
url = {},

