Feedback on Reviews of Eisenstadt Paper. HYPOTHESES: This was clearly an exploratory study: the bugs were collected and analysed and then some hypotheses were formed. 1. That a large percentage of these thorny bugs occur when there is (a) a large chasm between cause and effect or when (b) the use of standard debugging tools is hampered. 2. In these cases data-gathering techniques are of special relevance. 3. That there is a niche for new tools to address these classes of problems. 4. That is surprising how forthcoming programmers are to provide what are apparently accurate bug descriptions with little explicit incentive to do so. 5. That an online repository of bugs might be more useful than an FAQ. Further study is really needed to confirm these hypotheses: patterns inevitably show up in any data collection and it is not clear whether this was an artefact of the data collected or a general phenomenon, especially when the amount of data is small, as here. FLAWS: 1. The author set out to collect "particularly thorny bugs in large pieces of software". By definition, this is a very small fraction of all bugs. On the basis of his analysis of these bugs he identifies "a winning niche for future [debugging] tools". But the market for these tools may be too small for them to be profitable. This casts doubt on hypothesis 3. 2. The trawl will have been received by many people. Only 59 usable anecdotes were collected. Obviously, the people who sent these anecdotes were "forthcoming", but lots of people didn't respond. We don't know how many of these had anecdotes they could have sent, so it is unclear whether hypothesis 4 is true. 3. Hypothesis 1(b) is almost tautologous: debugging will necessarily be harder when no debugging tools are applicable. Nevertheless, it's still useful to make this observation explicit. It's especially useful to know *when* and *why* standard debugging tools become inapplicable. Hypotheses 1(a) and 2 are also not very surprising, in retrospect. Since "data-gathering techniques" are what we are forced to use when all else fails, then 2 is also almost tautologous in the context of 1(b). 4. The methodology of the data-collection and analysis was rather sloppy. There was no attempt to verify the accuracy of anecdotes or classify the respondents by their programming skills. The classification within each dimension was invented by the author and there was no confirmation of the assignment of anecdotes to categories. Assigning an equal fraction to anecdotes which satisfied multiple categories was a bit crude. The sample size was small. It was reported to contain 59 anecdotes, but the total given in table 4 is 55. The basis for the statistical analysis is unclear. Such sloppiness would be unacceptable in a hypothesis *testing* experiment, but is less harmful in an exploratory study. COMMON ERRORS: The following were common errors made in reviews. 1. This paper is not a survey, as this term is usually understood, i.e. of summarising the literature in a sub-field. The author confuses the issue by saying he "conducted a survey", but this is a reference to his data-gathering exercise. In fact, to the best of my knowledge, this kind of analysis of thorny bugs is novel: certainly no similar study is described. The references are mainly to people proposing debugging tools that might meet some of the needs identified. 2. The paper is an exploratory study rather than hypothesis testing. From the author's account, it seems clear that he did not come to it with a clear idea of what he would find. 3. Given that this was an exploratory study and not hypothesis testing, I thought some of you were too harsh about the sloppiness.