SUMMARY: Two generic and parallelisable data-mining primitives
(RI & IBL) were designed and implemented on both parallel and
sequential hardware. These primitives were employed in two
data-mining operations each and their parallel and sequential run
times compared on a set of databases. 

HYPOTHESES: These were not explicitly stated but appear to be:

1. The two primitives were generic in the sense that they could
be used in a wide variety of data-mining operations.

2. The parallel implementations gave a roughly linear speed-up
over the sequential ones. 

3. The KDD framework is scalable over large databases.

FLAWS: 

a) Hypothesis 1 was only tested on two, very similar, data-mining
operations for each primitive, which is not very strong
confirmation.

b) The evidence for hypothesis 2 was very dependent on the
details of the data-mining operation, with linear speed-up only
occurring for some operations. However, very little detail was
given for the more negative results (and only a high-level
summary of the positive results). This variability also provides
negative evidence for hypotheses 1 & 3.

c) They claim to be providing a framework for data-parallel KDD,
but very little is then said about this framework.

d) No related work is compared with, despite the extensive
bibliography.

e) No details are given of the data-mining operations. The reader
is assumed to be familiar with them already. 

f) The claimed linear speedup relies on the single sequential
processor being roughly equal in power to each of the parallel
transputers. This rough equivalence is asserted without proof. It
would have been more convincing to have run the sequential
experiments on the parallel machine running in sequential mode,
i.e. with just one transputer. Also, the linear speedup was claimed
on the basis of two data points: 10 machines vs 1 machine. It
would have been more convincing to have run the parallel
experiments with a variety of transputers, eg 1 vs 2 vs 5 vs 10.

g) The speed up in the TDIDT experiment was found to be inversely
proportional to the number of nodes in the induced tree. This
throws some doubt over hypothesis 3.

The work was reported in only an extended abstract, for which
there was apparently a 3 page limit. So, it was impossible to
give full details. Perhaps this outcome is a warning not to
publish work in outlets with overly tight space limits.


COMMON MISTAKES: Some people clearly had a great deal of
difficulty understanding just what this paper was about. The
authors probably deserve the major part of the blame for this; it
was not very clearly written.

a) The paper did not set out to contrast the two primitives; they
were complementary members of a toolkit of primitives.

b) Although it is not explicitly state, it is clear from the
context that the authors had hypothesis 1, 2 and 3 in mind from
the outset and then set out to prove them. Thus this work is to
test existing hypotheses rather than an exploratory study to
identify hypotheses, as several people claimed. 

c) The other main contribution of the paper is the invention of
the generic RI and IBL primitives, i.e. the invention of new
techniques. Several people omitted this. 

d) Some people are still not justifying relevance by reference to
the remit of the publication outlet. It is clear from just the
title of the IEE conference in which this was published that the
work is relevant to it.

e) Since the paper concerns *knowledge* discovery and uses
machine learning techniques, it is relevant to AI as well as CS.

f) Proposing a framework is not really a claim/hypothesis -- more
an aim. A claim might assert some property of this framework,
e.g. that is scalable, generic or gives linear speed up.

g) Hypothesis 2 was really the heart of the paper. Some people
missed this central point.