Report EDI-INF-RR-0474

Informatics Report Series

Report

EDI-INF-RR-0474

Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home

Title:A generic approach to software support for linguistic annotation using XML

Authors: Jean Carletta ; David McKelvie ; Amy Isard ; Andreas Mengel ; Marion Klein ; Morten Baun Mller

Date: 2004

Publication Title:Corpus Linguistics

Publisher:Continuum International

Publication Type:Book Chapter Publication Status:Published

Page Nos:449-459

ISBN/ISSN:0826460135

Abstract:: Large-scale linguistic annotation is currently employed for a wide range of purposes, including comparing communication under different conditions, testing psycholinguistic hypotheses, and training natural language engines. Current software support for linguistic annotation is poor, with much of it written for one-off tasks using special purpose data representations and handling routines. This impedes research because developing special purpose software is slow, and also makes it difficult to use existing annotations in analyses or applications for which they were not originally intended. XML, a text mark-up language which admits the possible annotations and allows reference to external files containing, for instance, speech and graphics, can be used as the basis of a representational format for linguistic annotation. XML is already a standard outside the linguistics community, and therefore is well-supported with basic processing software. It allows more formal and explicit representation of a wider range of possible annotation structures than formats currently in use. However, it can also be used for completely unstructured data or for data with an implicit structure which the annotators have yet to discover. Together with XSL, an emerging standard for XML transduction which makes it easier to display XML texts, adopting XML will enable faster tool development and more flexible data re-use.

Links To Paper
1st Link

Bibtex format
@InBook{EDI-INF-RR-0474,: author = { Jean Carletta and David McKelvie and Amy Isard and Andreas Mengel and Marion Klein and Morten Baun Mller },; title = {A generic approach to software support for linguistic annotation using XML},; book title = {Corpus Linguistics},; publisher = {Continuum International},; year = 2004,; pages = {449-459},; url = {http://homepages.inf.ed.ac.uk/jeanc/readings-in-corpling.final.webformat.pdf},
}

Home : Publications : Report

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh