Parsing, the task of assigning a syntactic structure to an utterance,is central to many natural language processing applications. Parsing has been the subject of intensive research over the past few years,resulting in probabilistic models that achieve both broad coverage and high accuracy. However, most of the existing parsing models have been developed for English and trained on a single corpus, the Penn Treebank. This raises the question whether these models generalize to other languages, and to annotation schemes that differ from the Penn Treebank markup.
We address this question by proposing a probabilistic parsing model
trained
on Negra, a syntactically annotated corpus for German. German has a
number of syntactic properties that set it apart from English, and the
Negra annotation scheme differs in important respects from the Penn
Treebank markup. We observe that existing lexicalized parsing models
using head-head dependencies, while successful for English,fail to
outperform an unlexicalized baseline model for German. Learning curves
show that this effect is not due to lack of training data. We propose
an alternative model that uses sister-head dependencies instead of
head-head dependencies. This model outperforms the baseline, achieving
a labeled precision and recall of around 74%.
We use this result to argue that head-sister dependencies are more
appropriate
for parsing languages with a relatively free word order (such as
German) and annotation schemes with very flat structures (such as
Negra).
|
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |