- Abstract:
- Background: Text categorisation has been used in bioinformatics to help identify documents containing protein-protein interactions. Standard text categorisation methods have used the bag-of-words approach with little input from NLP. While this has proved effective in the past, there is some evidence that the techniques are not adequate in some biological domains. Here we examine how chunking, named-entity recognition and relationship extraction can be combined with traditional text categorisation techniques to improve the classification of documents containing protein-protein interactions. Conclusions: A system that combines the output of an NLP system with the standard techniques of text categorisation can produce results that exceed the performance of either system on its own. The F1 of a system that combined features of an NLP system with standard text categorisation features was 68.1 compared with 62.0 using text categorisation alone and 61.9 using relationship extraction alone.
- Links To Paper
- 1st Link
- Bibtex format
- @InProceedings{EDI-INF-RR-1024,
- author = {
Michael Matthews
},
- title = {Improving Biomedical Text Categorisation with NLP},
- book title = {Proceedings of the SIGs, The Joint BioLINK-Bio-Ontologies Meeting, ISMB 2006},
- publisher = {ISMB},
- year = 2006,
- month = {Aug},
- volume = {1},
- pages = {93-96},
- url = {http://bio-ontologies.man.ac.uk/2006/download/MatthewsJBB06.pdf},
- }
|