Feedback on paper of Shepherd et al This is an interesting paper for you to review. On the face of it, it appears to be full of technical terms from bioinformatics and would seem to require someone expert in that area to do it justice. I maintain, however, that some basic understanding of informatics methodology is sufficient to do quite a good review job and, to a first approximation, the detailed bioinformatics material can be skimmed. Informaticians have to be 'jacks of all trades', so you need practice in operating outside your comfort zone. You do, of course, need to know something about databases, but I assume this is shared knowledge among all informaticians. HYPOTHESES For once, I think there is one main hypothesis and it is clearly (if not entirely explicitly) stated. The last sentence of the penultimate paragraph of the Results section on the first page says: "The resultant database is both fast and flexible." I think that's it. Earlier parts of this same section even give the 'because' clauses. Fast: "A potential drawback with this approach — poor performance caused by the number of joins across meta-level tables — is avoided by implementing the PFDB with materialized views using the mature relational database technology of Oracle 8i." Flexible: "The explicit representation of relationships at the meta-level has a number of advantages, including flexibility — both in terms of the range of queries that can be formulated and the ability to integrate new biological entities within the existing design." EVALUATION When it comes to evaluation, however, things are not so clear. Fast: Let's take the 'fast' claim first, as this is the clearer case. What you might expect here would be some empirical data, e.g., some timing information on a range of queries, perhaps with some favourable comparisons with similar data from rival systems. But, apart from a range of 2-10 seconds being given for the preformulated questions, there is nothing of this sort. Rather, the main evidence seems to be in the beginning of the "Implementing the PFDB" section p1688, when a design choice is presented between relational or object databases, and a pilot study with a particular object database shows poor performance. No comparative data is provided and no evidence that the results using that particular object database were representative of all of them. Furthermore, later in the discussion (bottom p1670, 2nd col) we read: "Preformulated queries are easy to use, they can be highly optimized to guarantee fast response times, and they prevent users from running queries that are inefficient and/or require excessive amount of CPU time. However, preformulated queries do not offer the kind of flexibility that many users desire." This suggests that either you can have 'fast', provided you stick to preformulated queries, or can you have the flexibility of using user-designed queries, but at the risk of slow performance. This observation rather undermines the conjunction in the main claim above, i.e., that you can have both. Similar remarks apply to the penultimate paragraph of the Discussion, which says: "The absence of atomic-level data from the PFDB points to another of its key characteristics. Rather than attempt to be comprehensive, the PFDB is by design selective in the data it allows users to search on, preferring high-level information to vast quantities of low-level information (such as atomic-level data). This selectivity has clear performance benefits." Again, it seems flexibility has been traded off in favour of efficiency. Flexible: The main body of the paper - from the 2nd column of p1666 to the 1st column of p1668 -- describes the many uses to which PFDB has been put and the many other DBs it interacts with. Apart from giving background information about the importance of the system, does this material provide any evidence for the main claim. Sort of. We could be generous and argue that it demonstrates flexibility by: "the ability to integrate new biological entities within the existing design" as taken from the second 'because' clause above. No evidence is provided, however, that this flexibility arises from: "The explicit representation of relationships at the meta-level" which is the reason claimed in that 'because' clause. Where the advantages of the meta-level representation /are/ discussed is in the 4 bullets at the bottom of p1668 and the top of p1669. Here we see examples of an unusually (presumably) wide range of query types and, in the last bullet, a claim about the easy introduction of new entities. This is backed up by the claim in the Discussion that: "The changes that need to be made to the PFDB schema in order to establish explicit, well-defined relationships to entities in MSD are negligible, being confined to a small number of base tables." The usefulness of the wide range of queries is, however, rather undermined by the provision of only preformulated queries in the web interface, which is (presumably) intended for use by the majority of users. A forthcoming /flexible/ interface is promised at the top of p1671, 1st column, which makes one question whether the claim of flexibility might have been a bit premature. There's no discussion about how this flexible interface will protect the user from asking questions requiring "excessive use of CPU time", which was part of the justification for restricting users to pre-formatted queries. RELATED WORK There is, essentially, no discussion of related work. This is needed both to establish the originality of the work and to provide a baseline against which to assess the speed and flexibility of PFDB. Without this discussion it is not possible to assess how significant the results are. YOUR REVIEWS On the whole, the standard of the reviews was very high. It's good to see several people improving significantly on their first review performance. * Some people missed the main claim about speed and flexibility, but sometimes identified subsiduary claims. Never-the-less, they were often able to spot the main criticisms about poor evaluation. * When identifying claims, try to spot and avoid: claims about the context within which the work is framed; claims that are well known parts of the 'folklore' of the field, for which there is usually nothing that can be cited because it's considered too obvious to merit publication; descriptions of what a system does or how it works. * Many people omitted 'experimental evidence to support a hypothesis' as one kind of contribution. Perhaps this was a reflection of the fact that they didn't do a very good job at evaluation, but I think they did try, if ineffectually. * Not everyone drew attention to the effective absence of a related work discussion. Several people /did/ spot examples of systems PFDB /should/ have been compared with, such as SCOP. * Some of you went to a lot of trouble to find out about the current state or PFDB and its rivals. Well done to you, although I wasn't expecting this.