- Abstract:
- It is often argued that in information extraction (IE), certain machine learning (ML) approaches save development time over others, or that certain ML methods (e.g. Active Learning) require less training data than others, thus saving development cost. However, such development cost claims are not normally backed up by controlled studies which show that such development cost savings actually occur. This situation in Language Engineering (LE) is contrasted with Software Engineering in general, where a lot of studies investigating system development cost have been carried out. We argue for the need of controlled studies that measure actual system development time in LE. To this end, we carry out an experiment in resource monitoring for an IE task: three named entity taggers for the same ``surprise'' domain are developed in parallel, using competing methods. Their human development time is accounted for using a logging facility. We report development cost results and present a breakdown of the development time for the three alternative methods. We are not aware of detailed previous parallel studies that detail how system development time in IE is spent.
- Copyright:
- 2007 by Linguit Ltd. and the University of Edinburgh. All Rights Reserved.
- Links To Paper
- 1st Link
- Bibtex format
- @InProceedings{EDI-INF-RR-0995,
- author = {
Jochen Leidner
},
- title = {Resource Monitoring in Information Extraction},
- book title = {The 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 23-27 July 2007, Amsterdam},
- publisher = {ACM},
- year = 2007,
- month = {Jul},
- pages = {779-780},
- doi = {10.1145/1277741.1277905},
- url = {http://www.iccs.inf.ed.ac.uk/~s0239229/documents/leidner-2007-SIGIR.pdf},
- }
|