Informatics Report Series
|
|
|
|
|
|
Title:Interpolating between Types and Tokens by Estimating Power-Law Generators |
Authors:
Sharon Goldwater
; T.L. Griffiths
; Mark Johnson
|
Date: 2006 |
Publication Title:Advances in Neural Information Processing Systems 18 |
Publication Type:Conference Paper
Publication Status:Published
|
|
|
- Abstract:
- Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-laws distribution in the frequencies of word tokens. We present a framework for developing statistical models that generically produce power-laws, augmenting standard generative models with an adaptor that produces the appropriate pattern of token frequencies. We show that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
- Links To Paper
- 1st link
- Bibtex format
- @InProceedings{EDI-INF-RR-1203,
- author = {
Sharon Goldwater
and T.L. Griffiths
and Mark Johnson
},
- title = {Interpolating between Types and Tokens by Estimating Power-Law Generators},
- book title = {Advances in Neural Information Processing Systems 18},
- year = 2006,
- url = {http://books.nips.cc/papers/files/nips18/NIPS2005_0333.pdf},
- }
|