Feedback on Reviews of Eisenstadt Paper.

HYPOTHESES: This was clearly an exploratory study: the bugs were
collected and analysed and then some hypotheses were formed. 

1. That a large percentage of these thorny bugs occur when there
is (a) a large chasm between cause and effect or when (b) the use
of standard debugging tools is hampered.

2. In these cases data-gathering techniques are of special
relevance. 

3. That there is a niche for new tools to address these classes
of problems. 

4. That is surprising how forthcoming programmers are to provide
what are apparently accurate bug descriptions with little
explicit incentive to do so. 

5. That an online repository of bugs might be more useful than an
FAQ.

Further study is really needed to confirm these hypotheses:
patterns inevitably show up in any data collection and it is
not clear whether this was an artefact of the data collected or a
general phenomenon, especially when the amount of data is small,
as here.

FLAWS: 

1. The author set out to collect "particularly thorny bugs in
large pieces of software". By definition, this is a very small
fraction of all bugs.  On the basis of his analysis of these bugs
he identifies "a winning niche for future [debugging] tools". But
the market for these tools may be too small for them to be
profitable. This casts doubt on hypothesis 3.

2. The trawl will have been received by many people. Only 59
usable anecdotes were collected. Obviously, the people who sent
these anecdotes were "forthcoming", but lots of people didn't
respond. We don't know how many of these had anecdotes they could
have sent, so it is unclear whether hypothesis 4 is true.

3. Hypothesis 1(b) is almost tautologous: debugging will
necessarily be harder when no debugging tools are
applicable. Nevertheless, it's still useful to make this
observation explicit. It's especially useful to know *when* and
*why* standard debugging tools become inapplicable. Hypotheses
1(a) and 2 are also not very surprising, in retrospect. Since
"data-gathering techniques" are what we are forced to use when
all else fails, then 2 is also almost tautologous in the context
of 1(b).

4. The methodology of the data-collection and analysis was rather
sloppy. There was no attempt to verify the accuracy of anecdotes
or classify the respondents by their programming skills. The
classification within each dimension was invented by the author
and there was no confirmation of the assignment of anecdotes to
categories. Assigning an equal fraction to anecdotes which
satisfied multiple categories was a bit crude. The sample size
was small. It was reported to contain 59 anecdotes, but the total
given in table 4 is 55. The basis for the statistical analysis is
unclear. Such sloppiness would be unacceptable in a hypothesis
*testing* experiment, but is less harmful in an exploratory
study.

COMMON ERRORS: The following were common errors made in reviews.

1. This paper is not a survey, as this term is usually
understood, i.e. of summarising the literature in a
sub-field. The author confuses the issue by saying he "conducted
a survey", but this is a reference to his data-gathering
exercise.  In fact, to the best of my knowledge, this kind of
analysis of thorny bugs is novel: certainly no similar study is
described. The references are mainly to people proposing
debugging tools that might meet some of the needs identified.

2. The paper is an exploratory study rather than hypothesis
testing. From the author's account, it seems clear that he did
not come to it with a clear idea of what he would find. 

3. Given that this was an exploratory study and not hypothesis
testing, I thought some of you were too harsh about the
sloppiness.