Feedback on paper of Ng Cheong Vee et al.


FLAWS

* The analysis of errors is rather anecdotal and superficial. The explanation of
them is rather thin and vague. 

* There is no quantitative analysis, e.g. statistics showing the frequency of
different kinds of error. 

* It is not clear how complete the analysis is, i.e. to what extent the
classification of errors covers all the errors that occur. 

* The experimental methodology is not described in sufficient detail to enable
it to be reproduced, and the results thus confirmed/refuted, by other
researchers. 

* Several claims are made in passing for which no evidence is provided, e.g. the
course was improved as a result of this study; there is no significant
differences between languages; these results could be used to inform the design
of an ITS.

* The Birkbeck subjects were not really novices. 

* The related work discussion is superficial and short, with no real attempt to
compare and contrast previous work with that described in the paper. 

* Terminology is frequently used before it is defined -- if it is defined at
all. 

* The path diagram notation is not described and remains rather cryptic. 

* "Inheritance" appears in the list of errors, but does not appear to be a type
of error, just an observation on the early use of advanced. Similarly, the
"extra variables" are more unnecessary verbosity than an error.


HYPOTHESES

This is an exploratory experiment looking for hypotheses -- in this case, what
types of errors novice OO programmers make. A list of error types is given; each
is described at a fairly high level and then briefly discussed. It's not clear
what conclusions to draw from this study. The work seems to be an at early stage
of development. It is planned to use the study to inform the design of an
intelligent tutoring system, but how this will help its design is not discussed
in any detail.

That the analysis of errors led to improvements in the course, could be
considered a claim, but the remark is made in passing and no evidence is
provided to support it, so it does not seem to be the main theme of the
paper. Similar remarks apply to the potential claim that automated methods of
data collection are better than manual ones.

There is no evaluation of any hypotheses, since these are the output of the
study and not the input. One could say that the exploratory experiment *was* the
evaluation that led to these inputs.


COMMON MISTAKES


* Many people claimed that the paper was evaluating a hypothesis, but I think it
was an exploratory investigation to *find* a hypothesis. That is what the bulk
of the paper described. 

* Essentially the same people struggled to identify a claim in the paper and
fastened on what were passing remarks rather than central claims. They then
criticised the paper a bit unfairly for not providing evidence for something
that the authors never seriously intended to evaluate. [This should be a lesson
for you in preparing papers of your own, not to make vague, passing claims that
you don't then rigorously defend. Your reviewers might also take these remarks
too seriously. Make it explicit that such claims are to be evaluated in future
work, not the current paper.] To be fair to authors, you should look to see
what, if any, evidence they do provide, as an indication of the claim(s) they
are seriously intending to defend.

* If a claim were being evaluated by comparison of the different subject groups,
e.g. by showing that an effect seen in one set-up was not seen in the other,
then it would be a methodological fault that the groups differed in more than
one respect. Some of you identified such a fault. However, in an exploratory
investigation, the fact that similar findings emerged from a variety of set-ups,
can be seen as a feature not a bug, i.e. it shows that the findings are robust
under minor variations.