HCI Course Exercise: Level 10

Speech Synthesis Demo for Edinburgh Science Festival

In this practical exercise you will be designing, building and evaluating an application to demonstrate Speech Synthesis. The best demos produced may be invited to feature in the Edinburgh International Science Festival. The Science Festival is aimed at the general public, and particularly at children. This extract from their web site gives an idea of the context in which the demo will be used, the level at which to pitch your demo, and the impact required from it.

Our aims are to give the children of Edinburgh and Scotland experiences of science that are inspiring and confidence building and to engage all of society in the wonder and value of science. Our passion is creating "Ah-ha" moments that illuminate the mysteries of our world. All our people strive hard to offer science and technology that children and grown-ups love to do.

To be effective, the demo should be engaging and entertaining as well as interesting and informative. Most importantly, the demo has to be easy to use --- as near to a walk-up-and-use interface as possible. Below we provide more details on the required and optional capabilities that the demo should or could include.

You should complete the exercise in pairs; this will allow you to divide the work up and discuss design decisions. Please find a partner for the practical and contact us immediately if you cannot.

The demo must be implemented in Java, for example, using JFC/Swing. We have provided a simple example interface which you may use to get started.

There are two parts to the practical: (1) design and implementation and (2) user evaluation. You should submit your working system and a report explaining the design process and decisions at the end of week 6 (Friday 31st October) and a second report describing the results of the user evaluation at the end of week 9 (Friday 21st November). Further details about the expected contents of these reports and the submission process are given below. Please read through the whole document before starting work.

Background

For a general background on speech synthesis, have a look at the Wikipedia entry for Speech Synthesis. The Speech Synthesis system we will use in this practical is provided by an Edinburgh based Speech Synthesis company, CereProc. It is a concatenative, unit selection system (see the wiki article for an explanation of these terms) capable of producing very natural sounding speech - as you can hear in these example recordings of CereVoice.

Please look at the paper The CereVoice Characterful Speech Synthesiser SDK for more details.

Example Interface

Copying files and setting up

First, copy of the files containing the simple demo and supporting software.

Compiling and running the GUI

For these steps your working directory should be the hci_practical/ you copied above.

These scripts invoke javac and java with the correct path settings. You may need to adjust these scripts later to include additional Java source files.

While the GUI starts you will see some output in the terminal window, which should include:

  -----------------------------------------------

  Loading Cerevoice libraries...
  ** Loaded cerevoice_aud library
  ** Loaded cerevoice library
  ** Loaded expat library
  ** Loaded cerevoice_io library
  -----------------------------------------------
  ***************Eliza read script**********************

Note that the voice won't be ready to speak until you see:

INFO: finished loading voice, starting server on port 1314

Put your headphones on and check the volume levels using the desktop volume control.

Troubleshooting

Interacting with the example interface

The provided interface has a text field to type in, a text area for a history of user input and chatbot output, and some buttons. Notice that the Exit button speaks if you hover the mouse over it, and again when you move the mouse off it.

The basic operation is to enter text and have it read back by the voice using the Synthesize button.

The next level of interaction is to tune the synthesis output to provide different tones of voice, pauses, and variant renderings. At present this can only be done by inserting tags into the text: the basic task for this practical is for you to implement a more user-friendly way of allowing this (e.g. via buttons). To get an idea of what different tags do, try some of the following:

Keep in mind that these different tags have different characteristics (e.g. categorical, continuous, numerical) so you might want to chose different interaction methods for them in your interface.

The Info button sends a file to be synthesised. Files sent for synthesis should be in XML format as in the provided example in files_io/in/info/wikipedia_speech_synthesis.xml.

The speaking Exit button suggests one way that speech can be incorporated into the interface, but also illustrates how such features can be annoying or confusing if set off unintentionally by the user.

Finally, the sample interface provides a 'fun' extension in which the speech produced is not what you typed in, but the reply produced by the Eliza program. Type "Hello" and then press Chat with Eliza for a conversation.

How it works

The Java files for the GUI are in the package/directory uk/ac/ed/inf/hci_synth_gui. The main class in SpeechSynthesisDemo.java starts the demo and loads the voice. To see how the GUI itself works, look at UserInterface.java. You can either use this file as the basis for making the changes for your interface, or start from scratch.

InputPacker.java puts input into the correct format to send for text normalisation. The following files which deal with normalisation, homograph disambiguation, and speech synthesis, should not be edited: NormFunctions.java, SpeechClient.java, SpeechRequestHandler.java, SynthFunctions.java, SpeechServer.java.

The chatbot is based on Joseph Weizenbaum's Eliza program. Look at the script in files_io/in/script and the chatbot code in net/chayden/eliza/ to see how the conversation works. (This was adapted from code available online: overview and instructions on how to modify the script).

For more details on how the speech synthesis system we are using works please look at the paper The CereVoice Characterful Speech Synthesiser SDK. It includes an overview of the system and has some examples using tags. Note that the Java wrapped version of CereVoice which we are using in this practical does not yet include the <voice emotion='happy'> or sig (amplitude, f0 and rate) tags mentioned in that paper. But you can use all the tags we have shown in the instructions above, as well as lex tags to set pronunciations. If you would like to experiment with setting pronunciations, (not required fouseful if you want to synthesise unusual words or if you want to override the pronunciation that the lexicon returns), have a look at the file hci_practical/scottish_phones.txt for more information including the characters you can use to set pronunciations with the Scottish voice we are using in this practical.

Exercise

This practical is in two parts: (1) the design and implementation, and (2) the evaluation.

Part 1. Design and Implementation

In this part you will design and build the demo system. The requirements are to:

The interface extension is open-ended. You should plan the design process carefully: you may try to involve a target user and use 'lo-fi' (e.g., pencil-and-paper) prototypes for testing. See remarks below on time management and using an IDE.

You should submit the relevant code and files to allow us to compile and run the demo, accompanied by a report of no more than 2000 words describing the design process, justifying design decisions (e.g. explaining features that you considered but decided not to implement), and assessing the strengths and weaknesses of what you have produced. To submit, one of the pair should use the electronic submission command like this:

submit cs4 hci-4 1 hci-practical.zip report1.pdf

The zip file must contain shell scripts compile.sh and run.sh so we can compile and run your interface. The report must be in PDF format and on the first page should give the name and matriculation number of both members of the pair. Please stick to the suggested file names for our convenience.

We will normally allocate both people the same mark. To notify us of an alternative allocation you have agreed, please put a table on the first page of the report with a percentage split. However, please only do this in extraordinary cases, and be aware that a clever design idea in this practical may have more impact on the mark than many hours of coding, so allocation by hours spent may not be the most appropriate.

The deadline for Part 1 is Friday 31st October, 4pm.

Part 2. User Evaluation

Evaluate your interface with users in one but preferably both of the following ways:

One option you might like to consider is collaborating with another group to either evaluate each other's system, or to use the same measurement protocol to make a comparison between your two systems. You will still need to submit one report per pair.

You should submit a report of no more than 2000 words explaining the evaluation methods chosen, reporting the results, and discussing them in relation to your previous assessment of the system. To submit, one of the pair should use the electronic submission command like this:

submit cs4 hci-4 2 report2.pdf

The report must be in PDF format and on the first page should give the name and matriculation number of both members of the pair. Please stick to the suggested file name for our convenience.

As in Part 1, we will normally allocate both people the same mark. To notify us of an alternative allocation you have agreed, please put a table on the first page of the report with a percentage split.

The deadline for Part 2 is Friday 21st November, 4pm.

Time Management

This practical is worth 30% of your mark for the course; it should not take you significantly more than 30 hours per person to complete both parts. Design tasks can consume an arbitrary amount of time so you need to be disciplined and plan how much time to allocate. Time constraints will have a big impact on what you can implement, so familiarise yourself early on with the GUI framework to get an idea of the degree of difficulty for any proposed design. Start your design on paper, as you should have a clear idea of what you want before you begin coding. Make sure the process is iterative --- test (and perhaps evaluate) each part as you build it. Avoid getting obsessed with perfecting low-level aesthetics (e.g. choosing fonts or designing icons) at the expense of good interaction design and usability.

References

Here are some references which may be helpful; please also consult the lecture notes.

Java Swing and GUI Components

Swing is the recommended GUI framework. See:

Using an IDE

You may, if you wish, use an IDE with an interface builder, such as NetBeans or Eclipse. Note that Eclipse does not include an interface builder by default, although plugins are available for assisting with SWT/JFace or Swing/AWT UI design.

Here are some pointers for NetBeans (the version installed on DICE is NetBeans IDE 5.5):

Important note: If you use an IDE, you must ensure that you are able to make a standalone Java application which can be compiled and run outside the IDE similarly to the one we have provided, so that we can build and run your code easily. It must run and compile on a standard DICE machine. Check that you can do this before getting too far. If you are unsure of how to do it, or concerned it will take too much time, please use hand crafted Swing code.

Design and Usability Guidelines

Here are some well-established examples of design guidelines: Jakob Neilsen's website has plenty of useful usability and design recommendations, for example:

Speech Synthesis

The pointers recommended above are:

Contact for problems

The TA supporting the practical is Eleanor Sim, email E.K.Sim@sms.ed.ac.uk. You could also consult your fellow students using the newsgroup eduni.inf.course.hci.


Practical prepared by Mark Fraser, David Aspinall and Barbara Webb.