Report EDI-INF-RR-1201

Informatics Report Series

Report

EDI-INF-RR-1201

Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home

Title:A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging

Authors: Sharon Goldwater ; T.L. Griffiths

Date: 2007

Publication Title:Proceedings of ACL 2007

Publisher:ACL

Publication Type:Conference Paper Publication Status:Published

Page Nos:744-751

Abstract:: Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximum-likelihood estimation (MLE) of the model parameters. We show using part-of-speech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary.

Links To Paper
1st link

Bibtex format
@InProceedings{EDI-INF-RR-1201,: author = { Sharon Goldwater and T.L. Griffiths },; title = {A Fully Bayesian Approach to Unsupervised Part-of-Speech Tagging},; book title = {Proceedings of ACL 2007},; publisher = {ACL},; year = 2007,; pages = {744-751},; url = {http://acl.ldc.upenn.edu/P/P07/P07-1094.pdf},
}

Home : Publications : Report

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh