- Abstract:
-
Large-scale linguistic annotation is currently employed for a wide range of purposes, including comparing communication under different conditions, testing psycholinguistic hypotheses, and training natural language engines. Current software support for linguistic annotation is poor, with much of it written for one-off tasks using special purpose data representations and handling routines. This impedes research because developing special purpose software is slow, and also makes it difficult to use existing annotations in analyses or applications for which they were not originally intended. XML, a text mark-up language which admits the possible annotations and allows reference to external files containing, for instance, speech and graphics, can be used as the basis of a representational format for linguistic annotation. XML is already a standard outside the linguistics community, and therefore is well-supported with basic processing software. It allows more formal and explicit representation of a wider range of possible annotation structures than formats currently in use. However, it can also be used for completely unstructured data or for data with an implicit structure which the annotators have yet to discover. Together with XSL, an emerging standard for XML transduction which makes it easier to display XML texts, adopting XML will enable faster tool development and more flexible data re-use.
- Links To Paper
- 1st Link
- Bibtex format
- @InBook{EDI-INF-RR-0474,
- author = {
Jean Carletta
and David McKelvie
and Amy Isard
and Andreas Mengel
and Marion Klein
and Morten Baun Mller
},
- title = {A generic approach to software support for linguistic annotation using XML},
- book title = {Corpus Linguistics},
- publisher = {Continuum International},
- year = 2004,
- pages = {449-459},
- url = {http://homepages.inf.ed.ac.uk/jeanc/readings-in-corpling.final.webformat.pdf},
- }
|