DME Lab Class for week 4

January 2002. Frederick Ducatelle and Chris Williams. School of Informatics, University of Edinburgh; revised 2003.

This tutorial is about classifiers: we will look at kNN, decision trees, Naive Bayes and SVM's. We will mainly work in Weka and we still use the Landsat data sattrn.arff and sattst.arff .

Start up Weka and read in the Landsat data. Then choose the 'Classify' tab sheet.
We will first consider a decision tree classifier. Click on the label under 'Classifier' and choose 'j48.J48' from the drop down menu. This is the c4.5 decision tree algorithm. The option 'minNumObj' defines the minimum number of objects in the leaf nodes. There are two forms of pruning. The traditional form uses post pruning based on confidence intervals. This is selected by setting the option 'unpruned' to false and choosing a confidenceFactor. The confidenceFactor should lie in (0,1). The other form of pruning is 'reducedErrorPruning'. With this option, a validation set is set apart before training, and the tree is evaluated on this set after training. Then nodes are removed to maximally improve the performance on this validation set.
Build a tree with the default pruning method (i.e. using confidence intervals). Under 'Test options' choose 'Supplied test set', click on 'Set...' and select 'sattst.arff'. Try out different confidence factor values (e.g. 0.005, 0.05, 0.5). Take a look at the size of the tree produced, the error rate and study the confusion matrix. Which classes cause the difficulties in classification? Does this confirm what you expected from the visualisation exercises we did in previous tutorials?
Record the best performance you obtained with j48.J48:
and the parameter settings used to achieve it.
Now use a very small confidence factor such as 0.0005 (giving very strong pruning), and set minNumObj to something quite large (e.g. 10). This will leave you with a fairly simple tree which still has an acceptable performance. Such a tree accentuates one of the strengths of decision tree algorithms: they produce classifiers which are understandable to humans. This can be an important asset in real life applications (people are seldom prepared to do what a computer program tells them if there is no clear explanation). Also, remember that c4.5 builds the tree by choosing one by one the most discriminating attributes. It judges these attributes using the information gain measure. So, in fact, it does information gain attribute selection.
Let's now have a look at Naive Bayes classification. This classification scheme builds a Bayesian network, using two simplifying assumptions (hence the term naive): it assumes that all predictive attributes are conditionally independent given the class, and that no hidden variables influence the prediction process. Choose 'NaiveBayes' under 'Classifier'. You will see that you get to set one option, 'useKernelEstimator'. Leave this option to its default ('False') and run the estimator.
Record the performance you obtained with Naive Bayes:
and the parameter settings used to achieve it.
Compare this result with those from previous classifiers.
Now we come to back to the 'useKernelEstimator' option. In traditional Naive Bayes classification, a numeric attribute is modeled using a normal distribution. This is often a good choice (and an easy one, as only the mean and the standard deviation have to be calculated). However, sometimes the distribution is clearly non-gaussian. In that case it can be interesting to approximate the more complex distribution shape using kernel estimation (the paper Estimating Continuous Distributions in Bayesian Classifiers (John and Langley, 1995) was the source for this Weka functionality). Try this option. Does this improve the classification? Can you think of other reasons why Naive Bayes would not perform very well on this particular data set?
Now let's look at a kNN classifier. Click on the label under 'Classifier', and then choose 'IBk' from the drop-down menu. You will get a range of possible options. 'KNN' is the most important: it defines the number of neighbors. The 'distanceWeighting' option allows you to adapt the influence of the neighbours according to their distance. 'noNormalization' means that the attributes are not normalised before classification.
Try out different values for 'KNN' (using default values for the other options). You might choose values of 1, 3 and 5. Under 'Test options' choose 'Supplied test set', click on 'Set...' and select 'sattst.arff'. You should be able to get very good results (certainly if you consider the fact that there are 6 different classes, and constantly choosing the most common class would only give you 24%). Can you think of any features of this dataset that make it particularly appropriate for classification with kNN?
Record the best performance you obtained with kNN:
and the parameter settings used to achieve it
You can choose to let the program find the best value for k automatically, by cross validation (setting 'crossValidate' to 'True'). 'KNN' is then the maximum number of neighbours to be tried out. Obviously this option takes long running times.
Weka has only limited support for SVM's: only polynomial kernels are possible, and no regression is supported (only 2-class classification). Therefore, if you want to try SVM's, it is better to use the special purpose program SVMTorch. Read through the short introduction to SVMTorch in A small user guide for SVMTorch. The program uses a different data format from Weka and Xgobi, see the files sattrn.svm and sattst.svm . In these files, I left out the empty class 6. Also, the numbering of the classes starts at 0. Build a classifier for the landsat data by typing:
```
  SVMTorch -multi -c 10 -t 2 -std 15 sattrn.svm model
```
In this line, the option -multi indicates that you have more than two classes. The option -c is the parameter C in the non-separable SVM optimisation formula (the trade-off between minimising the margin and the slack variables). The option -t chooses the kernel function (2 is gaussian), and the other options set the parameters of the kernel function (for a gaussian kernel, there is only one parameter: the standard deviation). The rest of the command line specifies the training data file and the name for the files that will contain the developed models.
Now test the model by typing:
```
  SVMTest -multi model sattst.svm
```
How good is the result? How does it compare to the other classification schemes? Try to improve the classification rate by finetuning the parameters -c and -std [Hint: you might explore making std larger]. As the performance improves, you will also see that the classification goes faster.
Record the best performance you obtained with SVMs:
and the parameter settings used to achieve it
You have now trained and tested different classification schemes by running them one by one and comparing them. When you are doing your project, it could be useful to automate this process: to have different algorithms (or the same algorithm with different parameter values) trained and compared to each other in batch. Weka has a special functionality for this, the 'Experimenter'. It can be started from the GUI chooser (the little window with the bird you get in the beginning). Follow the Weka Experimenter tutorial, and make sure you understand how the environment works. Try for example to compare different values for k in the kNN algorithm: are the performance differences statistically significant?

Home : Teaching : Courses : Dme