Download the CSV version of the Landsat dataset here. You can download these files by doing shift-leftclick (or using the right mouse button and using the "save link target as" option). Start GGobi by typing 'ggobi satgobi.csv' in the directory where the dataset is.
You can also get 3-D plots of the data by selecting 'Rotation' under GGobit's 'View' menu. Similarly with 2-D scatter plots, you assign a variable to each of the 3 axes. GGobit auto-rotates the 3-D plot so you can easily get a feel for the data's distribution across the 3 selected variables. You can change the rotation's speed or pause it completely using GGobit's interface. Play a bit with this functionality to get used to it. Try to discover what each element of GGobi's interface does in the 'Rotation' mode. If you have any difficulty, just ask!
Brushing can be used for outlier detection: you can mark points which are outliers in one view, and then investigate if they are also outliers in other views. Outlier detection can be used as a possible way to detect fraud or simply as a means of clearing up your dataset from faulty instances.
Repeat the exercise using the K-means clusterer.
And what happens if you replace pixel4_1 by pixel5_4 in the reduced dataset? You can do this easily: Load the initial sattrn.arff dataset again. In the Preprocess tab, select the checkboxes next to attributes pixel5_1, pixel5_2, pixel5_4, pixel6_1, pixel6_2 and the class attribute. Click 'Invert', so that every other attribute apart from those 6 are selected and then click 'Remove'. Run K-means again (with 6 clusters) and observe the change in Incorrectly Clustered Instances. Does this confirm what we mentioned about the attribute selection method in the previous tutorial? If you want, use 'Undo' to return to the initial dataset and apply the same procedure again, this time keeping other combinations of 5 attributes.
2D tour is a functionality which presents you with a continuous sequence of 2-dimensional projections of n-dimensional data (they are linear projections: both 2-D coordinates are linear combinations of all n attributes). These projections should be representative of all possible projections. When you see an interesting projection (where some classes are clearly set apart from others), you can pause the tour and, if you want, see the projection's coefficients (check the 'Show Projection Vals under the 'Tour2D' menu).
GGobit offers one extra functionality on top of this; projection pursuit looks for interesting 2-dimensional projections of the data. It does this by optimising a projection index (eg. by maximizing the 'empty' area between the data points). Click on 'Projection pursuit' and then on 'Optimize'. Different indices (under PP Index) will lead to different optimal projections. Also, bear in mind that these are local optima, so unselecting 'Optimize' and reselecting it later will give you a different view.
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Logging and Cookies Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |