Feedback on paper of Beale. Beale's paper describes 7 Smart phone apps he has built. The claims for this programme of research and the individual apps are left implicit and vague, but can sometimes be teased out from the text. The evaluations of these claims are patchy and sketchy partly, I suspect, because the claims are vague. The methodology is partly engineering: the construction of the apps, and partly experimental psychology: their evaluation. Related work is also sketchy, with a rather unsystematic and undeveloped citation of some other work. Curiously, given its technical limitations, the choice of Bluetooth as the underlying technology is never explicitly discussed, although implicitly the fact that it is free and the advantages of close physical proximity are occasionally alluded to. HYPOTHESES AND THEIR EVALUATION Perhaps the best statement of a claim for the whole programme of work comes in the final paragraph. "The smart phone's ability to support so many enhanced forms of communication suggests that it's both pervasive and flexible. Furthermore, as our research has demonstrated, it can improve users' sociability. By using the device's technical characteristics, we can produce designs that exploit the infrastructure and can thus develop new approaches to supporting interpersonal communication." Similar remarks are made in the subheading of the paper: "The smart phone offers communication, connectivity, content consumption, and content creativity. Seven different systems exemplify its ability to support a wide range of social interactions, helping make pervasive computing a reality." On p35, 2nd column, we also have: "The systems I report on here support social interaction between people - in particular, interaction that would be difficult if not impossible to achieve without the smart phone technology." The evaluation comes from the whole programme of work. The diversity of applications and their (assumed) ease of implementation establishes the 'pervasive and flexible' nature of smart phones, but we are not given evidence on the degree of implementation difficulty. The improved sociability and support for interpersonal communication comes from the evaluations of the individual apps. The 'impossible ... without' claim does not seem to be explicitly supported, i.e., there was no (unsuccessful) attempt to achieve the same goal via a different device, but this point was implicitly addressed in the evaluation of Bluedating. We then have some claims and evaluations for each of the 7 systems. 1. Bluedating: Claim: All I could find as a claim was the rather generic opening remark that: "To support interaction between individuals (1-1 interaction), a system should offer people a new experience or at least provide them with an easier solution to an old problem." Evaluation: The following quote offers some informal evidence for this claim: "We ran trials of the Bluedating system at the University of Birmingham. We provided the system to everyone who wanted it and gave questionnaires and performed informal interviews with those who agreed to try it. Reactions were positive. Most users had initial reservations about such systems, as they did about similar online services, but found Bluedating more acceptable because it alerted them subtly, letting them interact as they chose." No details of the evaluation are given, so it is impossible to say how convincing the evidence is, except for the negative remark that "Because few users have worked with the system with any degree of seriousness, we can't as yet report any statistically significant results." 2. BT Communities: Claim: The best candidate for a claim I could find is summarised in the following quote: "The new system, called BT Communities, provides a software framework that can run and manage multiple Bluetooth services from a single application." Evaluation: The evidence for the utility of this framework was that they built the JokeSwap and the chat facility apps in it and also re-implemented Bluedating. It would have provided further supporting evidence if they'd compared the difficulty of the original Bluedating app with its (presumed) easier re-implementation in the framework. 3. BT Share: Claim: The claim here seems to be that a similar approach can be scaled up to N-N interactions: "Our relative success using Bluetooth to support individuals led us to extend the approach to groups, thereby supporting N-N interactions." Evaluation: The evidence for the claim seems to be rather anecdotal: "We've observed informally that the sharing mechanisms encourage users to share images and content that they would otherwise not explicitly send to other users. When users select an item, they provide a topic of conversation, thus enhancing interpersonal communication." 4&5. PublicSpace & ShareSpace: These two systems seemed to be merged in terms of claims and evaluation. Claim: They extend the BT Share claim to wider groups: "We also designed two systems that support wider group communication: group to group (N-N) or, more generally, group to world (N-oo)." Evaluation: The evaluation of the initial implementation was not wholly successful: "However, letting anyone post messages meant that material soon overwhelmed the available display space, ..." So the system was upgraded to filter and organise the material better, so the final evaluation was more positive, if completely omitting any objective evidence: "The system became something of a cross between a bulletin board and instant messaging. However, by limiting the uploading of comments or new items to only when users were in the coffee area or its close vicinity, we retained a community feel." 6. IMMS: Claim: The closest I could find to a claim, rather than just an explanation of what the system is meant to do, was the following quote: "Because these devices are placed at the locus of traditional interaction between participants, they are situated, and they afford communication possibilities because users arrive at the location with that express purpose in mind." Evaluation: This evaluation seemed to come closest to a methodologically sound experiment (perhaps because this is the only application that was cited as being reported in technical conference #10). Unfortunately, it did not seem to evaluate the specific claim above, but a much more generic one about usability. "We evaluated the system with the help of six students representing a cross-section of potential users. Using a scale from 0 (bad) to 10 (good), we asked them to score the interface's look and usability (9.33, standard deviation 0.67), the display's content and appropriateness (7.83, s.d. 0.94), and the system's usefulness and functionality (9.00, s.d. 0.42). The results indicate that the users saw value in the system, which clearly met our goals in providing a pervasive and appropriate way for students and staff to communicate with each other." 7. SmartBlog: Claim: The report of the requirements capture exercise (the only one reported) gives some indication of what they wanted the system to demonstrate, albeit a rather vague one. "We interviewed some users to determine what features we needed to provide. All interviewees were active bloggers, and all felt that a decent mobile blogging system should be more than a way to publish camera phone images on a Web site. Blogs are fairly immediate, thus requiring more frequent revising than crafted Web sites, so editing and management abilities are important. The users also all wanted to use multimedia if it were simple, but 67 percent were concerned about the potential costs." Evaluation: This evaluation was also a standard usability one, albeit with a lot less detail than that for IMMS. "Six users tested SmartBlog, and they found it fun and easy to use, leading them to post more images to their blogs than usual. No one reported problems with installation or the interface, though most had problems setting up Bluetooth connectivity (a known problem with Nokia's PC Suite router). Regardless, 67 percent favored Bluetooth because it's free and fast. Moreover, all of the users were satisfied with the system's performance, especially because it didn't block any other phone functionality." YOUR REVIEWS On the whole your reviews did a good job of teasing out the main claim and critiquing the poverty of its evaluation. Some specific points were: * As some of you pointed out, the paper was written in 2005, when smart phones were in their infancy and should be assessed within the context of that era, rather than our current environment, which is rich in smart phone apps. * The journal is of a magazine style with short articles, so perhaps one should not be too critical of the lack of scientific and engineering details. [On the other hand, and I did not expect you to know this, it was submitted as one of the author's top 4 research papers over a 7 year period to RAE 2008, which was known to assess papers on originality, significance and rigour - so the rather informal methodology would have counted against it here.] * If you claim that a paper describes or extends a technique, or combines two or more techniques, be sure to explicitly state which techniques you have in mind. Similarly, if you claim that a system models a natural system, say what this natural system is. * Some people complained about the lack of user testing of BT Communities. This is to misunderstand what is the appropriate kind of hypothesis for a framework: it's that it is easy and quick to implement a wide variety of systems leading to dependable and efficient applications. 3 systems were implemented in it, but more discussion was needed of the benefits of using the framework as opposed to stand-alone implementation. * As I hope I have convinced you, there are very many possible hypotheses that could be advanced about a particular project. It is easy to criticise a paper for failing to supply evidence to support a claim that has not been made, but this is unfair. Some people did do this. * This was not an exploratory project trying to identify a hypothesis. Even if stated implicitly, the author clearly had some claims in mind from the start of the project, so was not seeking to identify possible claims. * Nor was this a model of a natural system or computational modelling. If so, what was being modelled, surely not a natural system, such as the human mind? * Sentences that use the word 'must', e.g., "For smart phones to become successful pervasive system components, they must support and enhance various user activities ..." are not naturally interpreted as hypotheses, as some of you did. They are rather assertions of the author's opinion. * Several people, rightly, listed a lack of discussion of related work as a deficiency, but then did not expand on this point. There were some references to related work, so why weren't these enough? Related work discussion is part of the evaluation and helps establish the work's originality by contrasting it with earlier work. This was not done in sufficient detail. In particular, it should have been used to establish the claim on p35 about the impossibility of achieving the same level of interaction without smart phones.