Feedback on paper of Ng Cheong Vee et al. FLAWS * The analysis of errors is rather anecdotal and superficial. The explanation of them is rather thin and vague. * There is no quantitative analysis, e.g. statistics showing the frequency of different kinds of error. * It is not clear how complete the analysis is, i.e. to what extent the classification of errors covers all the errors that occur. * The experimental methodology is not described in sufficient detail to enable it to be reproduced, and the results thus confirmed/refuted, by other researchers. * Several claims are made in passing for which no evidence is provided, e.g. the course was improved as a result of this study; there is no significant differences between languages; these results could be used to inform the design of an ITS. * The Birkbeck subjects were not really novices. * The related work discussion is superficial and short, with no real attempt to compare and contrast previous work with that described in the paper. * Terminology is frequently used before it is defined -- if it is defined at all. * The path diagram notation is not described and remains rather cryptic. * "Inheritance" appears in the list of errors, but does not appear to be a type of error, just an observation on the early use of advanced. Similarly, the "extra variables" are more unnecessary verbosity than an error. HYPOTHESES This is an exploratory experiment looking for hypotheses -- in this case, what types of errors novice OO programmers make. A list of error types is given; each is described at a fairly high level and then briefly discussed. It's not clear what conclusions to draw from this study. The work seems to be an at early stage of development. It is planned to use the study to inform the design of an intelligent tutoring system, but how this will help its design is not discussed in any detail. That the analysis of errors led to improvements in the course, could be considered a claim, but the remark is made in passing and no evidence is provided to support it, so it does not seem to be the main theme of the paper. Similar remarks apply to the potential claim that automated methods of data collection are better than manual ones. There is no evaluation of any hypotheses, since these are the output of the study and not the input. One could say that the exploratory experiment *was* the evaluation that led to these inputs. COMMON MISTAKES * Many people claimed that the paper was evaluating a hypothesis, but I think it was an exploratory investigation to *find* a hypothesis. That is what the bulk of the paper described. * Essentially the same people struggled to identify a claim in the paper and fastened on what were passing remarks rather than central claims. They then criticised the paper a bit unfairly for not providing evidence for something that the authors never seriously intended to evaluate. [This should be a lesson for you in preparing papers of your own, not to make vague, passing claims that you don't then rigorously defend. Your reviewers might also take these remarks too seriously. Make it explicit that such claims are to be evaluated in future work, not the current paper.] To be fair to authors, you should look to see what, if any, evidence they do provide, as an indication of the claim(s) they are seriously intending to defend. * If a claim were being evaluated by comparison of the different subject groups, e.g. by showing that an effect seen in one set-up was not seen in the other, then it would be a methodological fault that the groups differed in more than one respect. Some of you identified such a fault. However, in an exploratory investigation, the fact that similar findings emerged from a variety of set-ups, can be seen as a feature not a bug, i.e. it shows that the findings are robust under minor variations.