Feedback on paper of Ritchie

TYPE OF PAPER

This is a very different kind of paper from those you have reviewed so far. I
think of this kind of paper as 'reasoned argument'. It's a style you will also
find in Philosophy and Linguistics, for instance, especially where a framework
is being proposed and the hypothesis is that this is the right kind of framework
for the job. Consider, for instance, a proposal for a grammar of English or a
logic of belief. The proposer will advance some rules, and then try to argue
that they generate all and only the objects in question, e.g., well-formed
sentences of English or theorems about beliefs.  The argument might be to
illustrate some correct objects that they do generate and some incorrect ones
that they don't. Some alternative rules might also be considered, only to show
that they generate incorrect objects or fail to generate correct ones.

Unfortunately for reviewers, there is usually no one key point in the paper that
you can identify where the argument is won or lost. Rather, both the claim and
the evaluation are interleaved and spread throughout the paper.

Another way in which this paper is unusual is that it is methodological. That
is, it is proposing a set of criteria by which the products of an area of
computing research should be evaluated.  How should it itself be evaluated? I
suggest, in much the same way as you would assess a proposed grammar or logic:
how well does it encompass all and only the objects it intends to encompass? In
this case the objects it tries to encompass are evaluations of creative computer
programs.

This paper is the extended, revised journal-paper version of an earlier minor
conference paper (Ritchie 2001). Such upgrading of an earlier conference paper
is a rare case where double publication is approved of, even encouraged, in
Informatics research. There is a strong argument, however, that you should not
then include both papers on your publication list, since they are in some sense
the same paper. The history is that after (Ritchie 2001) appeared, the various
papers he cites in 'Related Proposals' and 'Applications of the Criteria', most
of which build on his earlier work. He is thus in the rare situation where he
can cite work that effectively comments on his current paper. One should not
normally consider that (Ritchie 2001) undermines the claim of originality of the
current paper, since it can be considered as a final version of an earlier
draft, written when the work was still in progress.


HYPOTHESIS AND EVALUATION

No hypothesis is explicitly stated, but the following extract from the 'Use of
the Criteria' subsection on p90 comes close:

   "The aim of the original presentation of the criteria (Ritchie 2001) was to
    show how to make precise some factors which are of interest when assessing a
    potentially creative program, in order to illustrate a range of
    possibilities which would-be assessors of programs could select from, add
    to, or modify in a systematic way."

Two sentences down this proposal is described as "a framework". Ritchie
acknowledges that many of the criteria are in tension, that each system will
require a different subset of criteria and that it will sometimes be necessary
to modify the criteria or invent new ones to suit the circumstances. He
deliberately offers no way to combine the measures of the separate criteria into
a single overall measure, but does suggest a way of grouping them into measures
for different aspects of creativity.

One way to evaluate the framework is by seeing to what extent evaluators of
allegedly creative, computer programs found it to be useful as a basis for their
evaluations. A section of case studies of such evaluations is provided in the
"Applications of the Criteria" section starting on p85.

Also, Ritchie conducts evaluation in parallel with the presentation of each
criteria, where its relevance is argued for and it is compared (favourably) with
some alternative criteria. In particular, the "Related Proposals" section offers
some rival criteria proposed by others and compares them, mostly unfavourable,
with those of the other authors.


CRITIQUE

How well does the proposed framework succeed?

1. One might question the initial assumptions. Is it really right to ignore how
   the program works? A program that just output pre-constructed outputs, e.g.,
   set I was stored in the program and its elements output, would score well on
   criterion 9 and some other criteria. Should this be considered creative? [One
   might also argue that by including I in some of the criteria, the principle
   of ignoring how the program worked had been partly violated.]

2. In a similar vein, can creativity be adequately assessed by a set of scores
   in the interval [0,1]? I imagine a lot of human art critiques would consider
   this an inadequate measure.

3. As the case studies show, the measures produced by the various criteria are
   highly dependent on the settings chosen for the various thresholds: alpha,
   beta, gamma, theta, etc. Also, since no semantics is given for the various
   measures, there is scope for human subjectivity in assigning them.  So it is
   easy to manipulate the criteria to give good (or bad) scores, as required.

4. There is little discussion of how creative program evaluators, evaluated the
   proposed evaluation framework. Mostly it is just reported that they used it
   without any assessment of how well they thought it did the task.  There are
   some, explicit and implicit, criticisms in that the users made changes to the
   framework, but some degree of change to the framework was already implicitly
   approved by its presentation as a smorgasbord of criteria. So, it is
   impossible to declare the framework a success or failure on the grounds of
   these case studies, unless you regard the fact that some users did adopt the
   framework as evidence for success.


YOUR REVIEWS

Given the unusual nature of this paper, it is perhaps not surprising that some
people found it quite difficult to review and missed the main points. Here are
some of the more common errors.

* Relatively few people focused on what I took to be the main deficiency of the
  paper, namely that the framework was not very convincingly evaluated. 

* This was not an 'exploratory investigation to suggest a hypothesis'. It's
  clear that Ritchie had a hypothesis in mind from the start, although he might
  have stated it rather late in the paper. 

* You can think of the proposed framework as a new technique being applied to
  the new problem of evaluating allegedly creative programs.

* The paper wasn't trying to claim that computer programs could be creative, but
  to suggest a framework for evaluating whether they were and to what degree.