Report EDI-INF-RR-1024

Informatics Report Series

Report

EDI-INF-RR-1024

Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home

Title:Improving Biomedical Text Categorisation with NLP

Authors: Michael Matthews

Date:Aug 2006

Publication Title:Proceedings of the SIGs, The Joint BioLINK-Bio-Ontologies Meeting, ISMB 2006

Publisher:ISMB

Publication Type:Conference Paper Publication Status:Published

Volume No:1 Page Nos:93-96

Abstract:: Background: Text categorisation has been used in bioinformatics to help identify documents containing protein-protein interactions. Standard text categorisation methods have used the bag-of-words approach with little input from NLP. While this has proved effective in the past, there is some evidence that the techniques are not adequate in some biological domains. Here we examine how chunking, named-entity recognition and relationship extraction can be combined with traditional text categorisation techniques to improve the classification of documents containing protein-protein interactions. Conclusions: A system that combines the output of an NLP system with the standard techniques of text categorisation can produce results that exceed the performance of either system on its own. The F1 of a system that combined features of an NLP system with standard text categorisation features was 68.1 compared with 62.0 using text categorisation alone and 61.9 using relationship extraction alone.

Links To Paper
1st Link

Bibtex format
@InProceedings{EDI-INF-RR-1024,: author = { Michael Matthews },; title = {Improving Biomedical Text Categorisation with NLP},; book title = {Proceedings of the SIGs, The Joint BioLINK-Bio-Ontologies Meeting, ISMB 2006},; publisher = {ISMB},; year = 2006,; month = {Aug},; volume = {1},; pages = {93-96},; url = {http://bio-ontologies.man.ac.uk/2006/download/MatthewsJBB06.pdf},
}

Home : Publications : Report

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh