- Abstract:
-
The volume of electronically stored information increases exponentially as the state of the art progresses. Automated Information Filtering (IF) and Information Retrieval (IR) systems are therefore acquiring rapidly increasing prominence. However, such systems sacrifice efficiency to boost effectiveness. Such systems typically have to cope with sets of vectors of many tens of thousands of dimensions. Rough Set (RS) theory can be applied to reducing the dimensionality of data used in IF/IR tasks, by providing a measure of the information content of datasets with respect to a given classification. This can aid IF/IR systems that rely on the acquisition of large numbers of term weights or other measures of relevance.
This paper investigates the applicability of RS theory to the IF/IR application domain and compares this applicability with respect to various existing TC techniques. The ability of the approach to generalise given a minimum of training data is also addressed. The background of RS theory is presented, with an illustrative example to demonstrate the operation of the RS-based dimensionality reduction. A modular system is proposed that allows the integration of this technique with a large variety of different IF/IR approaches. The example application, categorisation of E-mail messages, is described. Systematic experiments and their results are reported and analysed.
- Copyright:
- 2002 by The University of Edinburgh. All Rights Reserved
- Links To Paper
- No links available
- Bibtex format
- @Misc{EDI-INF-RR-0121,
- author = {
Alexios Chouchoulas
and Qiang Shen
},
- title = {Rough Set-Aided Keyword Reduction for Text Categorisation},
- year = 2001,
- month = {May},
- volume = {15(9)},
- pages = {843-873},
- }
|