Informatics Report Series



Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Title:Improving Biomedical Text Categorisation with NLP
Authors: Michael Matthews
Date:Aug 2006
Publication Title:Proceedings of the SIGs, The Joint BioLINK-Bio-Ontologies Meeting, ISMB 2006
Publication Type:Conference Paper Publication Status:Published
Volume No:1 Page Nos:93-96
Background: Text categorisation has been used in bioinformatics to help identify documents containing protein-protein interactions. Standard text categorisation methods have used the bag-of-words approach with little input from NLP. While this has proved effective in the past, there is some evidence that the techniques are not adequate in some biological domains. Here we examine how chunking, named-entity recognition and relationship extraction can be combined with traditional text categorisation techniques to improve the classification of documents containing protein-protein interactions. Conclusions: A system that combines the output of an NLP system with the standard techniques of text categorisation can produce results that exceed the performance of either system on its own. The F1 of a system that combined features of an NLP system with standard text categorisation features was 68.1 compared with 62.0 using text categorisation alone and 61.9 using relationship extraction alone.
Links To Paper
1st Link
Bibtex format
author = { Michael Matthews },
title = {Improving Biomedical Text Categorisation with NLP},
book title = {Proceedings of the SIGs, The Joint BioLINK-Bio-Ontologies Meeting, ISMB 2006},
publisher = {ISMB},
year = 2006,
month = {Aug},
volume = {1},
pages = {93-96},
url = {},

Home : Publications : Report 

Please mail <> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh