Feedback on paper of Ritchie TYPE OF PAPER This is a very different kind of paper from those you have reviewed so far. I think of this kind of paper as 'reasoned argument'. It's a style you will also find in Philosophy and Linguistics, for instance, especially where a framework is being proposed and the hypothesis is that this is the right kind of framework for the job. Consider, for instance, a proposal for a grammar of English or a logic of belief. The proposer will advance some rules, and then try to argue that they generate all and only the objects in question, e.g., well-formed sentences of English or theorems about beliefs. The argument might be to illustrate some correct objects that they do generate and some incorrect ones that they don't. Some alternative rules might also be considered, only to show that they generate incorrect objects or fail to generate correct ones. Unfortunately for reviewers, there is usually no one key point in the paper that you can identify where the argument is won or lost. Rather, both the claim and the evaluation are interleaved and spread throughout the paper. Another way in which this paper is unusual is that it is methodological. That is, it is proposing a set of criteria by which the products of an area of computing research should be evaluated. How should it itself be evaluated? I suggest, in much the same way as you would assess a proposed grammar or logic: how well does it encompass all and only the objects it intends to encompass? In this case the objects it tries to encompass are evaluations of creative computer programs. This paper is the extended, revised journal-paper version of an earlier minor conference paper (Ritchie 2001). Such upgrading of an earlier conference paper is a rare case where double publication is approved of, even encouraged, in Informatics research. There is a strong argument, however, that you should not then include both papers on your publication list, since they are in some sense the same paper. The history is that after (Ritchie 2001) appeared, the various papers he cites in 'Related Proposals' and 'Applications of the Criteria', most of which build on his earlier work. He is thus in the rare situation where he can cite work that effectively comments on his current paper. One should not normally consider that (Ritchie 2001) undermines the claim of originality of the current paper, since it can be considered as a final version of an earlier draft, written when the work was still in progress. HYPOTHESIS AND EVALUATION No hypothesis is explicitly stated, but the following extract from the 'Use of the Criteria' subsection on p90 comes close: "The aim of the original presentation of the criteria (Ritchie 2001) was to show how to make precise some factors which are of interest when assessing a potentially creative program, in order to illustrate a range of possibilities which would-be assessors of programs could select from, add to, or modify in a systematic way." Two sentences down this proposal is described as "a framework". Ritchie acknowledges that many of the criteria are in tension, that each system will require a different subset of criteria and that it will sometimes be necessary to modify the criteria or invent new ones to suit the circumstances. He deliberately offers no way to combine the measures of the separate criteria into a single overall measure, but does suggest a way of grouping them into measures for different aspects of creativity. One way to evaluate the framework is by seeing to what extent evaluators of allegedly creative, computer programs found it to be useful as a basis for their evaluations. A section of case studies of such evaluations is provided in the "Applications of the Criteria" section starting on p85. Also, Ritchie conducts evaluation in parallel with the presentation of each criteria, where its relevance is argued for and it is compared (favourably) with some alternative criteria. In particular, the "Related Proposals" section offers some rival criteria proposed by others and compares them, mostly unfavourable, with those of the other authors. CRITIQUE How well does the proposed framework succeed? 1. One might question the initial assumptions. Is it really right to ignore how the program works? A program that just output pre-constructed outputs, e.g., set I was stored in the program and its elements output, would score well on criterion 9 and some other criteria. Should this be considered creative? [One might also argue that by including I in some of the criteria, the principle of ignoring how the program worked had been partly violated.] 2. In a similar vein, can creativity be adequately assessed by a set of scores in the interval [0,1]? I imagine a lot of human art critiques would consider this an inadequate measure. 3. As the case studies show, the measures produced by the various criteria are highly dependent on the settings chosen for the various thresholds: alpha, beta, gamma, theta, etc. Also, since no semantics is given for the various measures, there is scope for human subjectivity in assigning them. So it is easy to manipulate the criteria to give good (or bad) scores, as required. 4. There is little discussion of how creative program evaluators, evaluated the proposed evaluation framework. Mostly it is just reported that they used it without any assessment of how well they thought it did the task. There are some, explicit and implicit, criticisms in that the users made changes to the framework, but some degree of change to the framework was already implicitly approved by its presentation as a smorgasbord of criteria. So, it is impossible to declare the framework a success or failure on the grounds of these case studies, unless you regard the fact that some users did adopt the framework as evidence for success. YOUR REVIEWS Given the unusual nature of this paper, it is perhaps not surprising that some people found it quite difficult to review and missed the main points. Here are some of the more common errors. * Relatively few people focused on what I took to be the main deficiency of the paper, namely that the framework was not very convincingly evaluated. * This was not an 'exploratory investigation to suggest a hypothesis'. It's clear that Ritchie had a hypothesis in mind from the start, although he might have stated it rather late in the paper. * You can think of the proposed framework as a new technique being applied to the new problem of evaluating allegedly creative programs. * The paper wasn't trying to claim that computer programs could be creative, but to suggest a framework for evaluating whether they were and to what degree.