Inf1 Cognitive Science: Week 4 Lab

Welcome Back!

The goal of this lab is to see how the story about phonological rules works in practice: to get a feel for how those rules work and to learn about phonetic alphabets, which are used to represent the pronounciation of words. You'll also get experience using the Unix stream editor sed to implement a simulation of phonological rules using simple string manipulation.

In this lab session you are going to be learning about:

the Arpabet phonetic alphabet
the sed UNIX command
using sed to implement phonological rules

If you don't understand any of the instructions below, if things don't seem to be working out properly, or if you have any other questions, please put up your hand and wait till someone comes round to help you!

Also, this session will be more fun and productive if you work through it with a partner!

At the end of each section, there is a link with the solutions to the exercises. You are of course free to immediately clic there and conclude the lab in 7 mins but obviously this is not the aim of the lab (and you probably didn't put so much effort into climb 5 flight of stairs just to leave without having learnt anything new). You should try your best to solve the exercises with your partner and use the results only to check if everything is fine. If you feel completely stuck and you have alredy tried to solve the problem, you can have a critical look at it and try to understand why you couldn't do it by yourself.

1. The Arpabet phonetic alphabet

This shouldn't be news to you, but English words are not always spelled as they are pronounced, and vice versa. This is one of the reasons why linguists have developed phonetic alphabets, which try to directly represent the pronunciation of words.

The most famous phonetic alphabet is the International Phonetic Alphabet (IPA), developed during the late 19th century. Most of the dictionaries report the phonetic spelling for every entry. For example, open the Oxford English Dictionary and check the phonetic spelling for the following words:

vision: __________________________________________
nation: __________________________________________
think: __________________________________________

As you probably have already figured out, one problem with the IPA is that it uses lots of different symbols that are not easy to represent in computer terminals, and that don't have keys on computer keyboards, for example ə, ɔ and ʃ.

Over time, different computer-friendly representations have been developed. One solution is to allow a phoneme to be represented by up to two characters, and to use spaces to separate individual phonemes. This is the approach taken by the Arpabet phonetic alphabet.

Do the following:

Read through the Wikipedia page for Arpabet. You can ignore the stuff about using numbers to represent word stress. What Arpabet symbol is used to represent:
- the middle consonants in the words "vision" and "nation"? _____ _____
- the first consonants in the words "there", "think", "pink", and "chink"? _____ _____
- the vowel sounds in the words "mate", "mat", "met" and "might"? _____ _____
Play with the CMU Pronouncing Dictionary, which converts English words from standard spelling into their pronunciation. For example, if you type "sausages" into the box and click "Look up", you get the answer back "S AO S IH JH IH Z". Type in a few more words and see what output you get.
Type in the regular past tense forms "talked", "jogged" and "wanted", to see how the past tense suffix -ed can get pronounced in three different ways.
- talked: __________________________________________
- jogged: __________________________________________
- wanted: __________________________________________

Note that the representation of vowel phonemes in the CMU Pronouncing Dictionary has a strong North American bias.

2. The `sed` command

Do the following:

Open up a terminal window on your machine.
Enter the following commands (hitting "enter" after each one)
- echo "bananarama" | sed 's/n/f/'
- echo "bananarama" | sed 's/n/f/g'
- echo "bananarama" | sed 's/a/i/g'
- echo "bananarama" | sed 's/n//g'
- echo "bananarama" | sed 's/a//g'
- echo "bananarama" | sed 's/na/ni/g'
- echo "bananarama" | sed 's/$/s/g'

The first part of these commands, i.e. the subcommand echo "bananarama" followed by the pipe symbol |, just means something like "Take the word 'bananarama' and do the following thing to it".

The second part, i.e. the thing that is done to the word, always takes the following form in the above examples:

      sed 's/.../.../g'

sed is the UNIX line editor. It performs basic editing commands on a line of text. Although basic, it is extremely powerful.

Can you figure out what sed 's/.../.../g' means? What do 's' and 'g' stand for? __________________________________________

You can assign a name to these substitutions, using the UNIX command alias:

  alias substitute='sed "s/$/s/g"'

and use it as before:

  echo "bananarama" | substitute

You can get a local copy of a file and give it a name, by running the following commands:

      wget http://www.inf.ed.ac.uk/teaching/courses/inf1-cg/labs/lab4/HelloWorld.txt
      HW=HelloWorld.txt

Now when you type in the following (including the '$'):

      cat $HW

you should get the content of the file HelloWorld.txt printed on the screen.

If you apply the previous substitution to $HW: what do you get?

_______________________________________________________________________________

If you have already completed the task, or if you are really (really, really) stuck, have a look to the Solutions

3. Using `sed` to implement morphological rules

Now look at this list of nouns

This lists the contents of a file, called 'nouns.txt'. You should see a list of 15 nouns (on separate lines) denoting kinds of animal, represented in Arpabet. Can you figure out what each animal is? Remember the phonemes are represented with their North American pronunciation!

Let's get a local copy of this file and give it a name (NOUNS)

_______________________________________________________________________________

print on the screen the content of the entire file (tip: man cat)

_______________________________________________________________________________

you should get the same list of animals as you saw in the browser.

Do you remember the English Morphological Rule to produce the plural form of a noun? (tip: /z/)

______________________________________________________________________________

Can you imagine how to produce this substitution (woops... morphological rule) in the list of nouns in $NOUNS?

______________________________________________________________________________

Give to the morphological rule a name ("plural") and apply the rule to the nouns in $NOUNS

_______________________________________________________________________________

Can you see any problems with the way this rule has been implemented here?

_______________________________________________________________________________

If you have already completed the task, or if you are really (really, really) stuck, have a look to the Solutions

4. Using `sed` to implement phonological rules

As you noticed, the output we get from running the morphological rule in section 3 to the nouns in the file is still not perfect (try pronouncing the sequences of sounds with your mouth to see what is wrong).

You should recall from the lectures last week, that the output of the plural morphological rule in English, is itself the input to a system of phonological rules, which change some of the sounds to make then whole thing easier to articulate with the tongue.

Two phonological rules were proposed:

If a word contains the sequence of sounds 's z', then interpolate a schwa between the 's' and the 'z'.
If a word contains the sequence of sounds 'C z', where 'C' denotes one of the unvoiced consonants like 'p' or 't' or 'k' or 's' etc., then replace the 'z' with its unvoiced counterpart 's'.

The first of these, is called 'anaptyxis'. How would you implement this rule (call it anaptyxis).

_______________________________________________________________________________

Try to add this new rule to the plural morphological rule from the previous section (tip: remember to use the "|")

_______________________________________________________________________________

In other words, first we perform the morphological rule from section 3, then we do the phonological rule on its result. Try it out yourself and see what happens. Is the output better than before?

The second phonological rule, is called 'devoicing'. Write this rule. Note that we can create more complex sed commands by combining them using the semi-colon ';' (sed "s/.../.../g; s/.../.../")

_______________________________________________________________________________

Try to add this new rule to the plural morphological rule from the previous section (tip: remember to use the "|")

_______________________________________________________________________________

Finally, we can combine the two phonological rules to run one after the other:

_______________________________________________________________________________

Try this out. Does it work OK? What happens if you apply the two phonological rules in the wrong order?

_______________________________________________________________________________

We still can't capture the fact that irregular nouns like 'goose' and 'sheep' don't accept regular plural suffixes. Can you add yet another step to the sequence to sort this out?

_______________________________________________________________________________

If you have already completed the task, or if you are really (really, really) stuck, have a look to the Solutions

5. Using `sed` to implement regular past tense inflection

Now look at another list of words

You should see a list of 15 verbs, represented in Arpabet. Can you figure out what each verb is?

Let's get a local copy of this file and give it a name (VERBS)

Recall from the lectures that regular past tense inflection can be captured by positing one morphological rule and two phonological rules:

The past tense form of a verb is created by adding the phoneme 'd' to the end of the verb's base form.
If a word contains the sequence of sounds 't d' or 'd d', then interpolate a schwa between the two consonants.
If a word contains the sequence of sounds 'C d', where 'C' denotes one of the unvoiced consonants like 'p' or 't' or 'k' etc., then replace the 'd' with its unvoiced counterpart 't'.

Implement this system of rules using a sed command for each rule, as you did in the previous section.

alias pasttense: _______________________________________________________________________________

alias VBanaptyxis: _______________________________________________________________________________

alias VBdevoicing: _______________________________________________________________________________

Put it all together: _______________________________________________________________________________

If you have already completed the task, or if you are really (really, really) stuck, have a look to the Solutions

6. Vowel harmony in Old English

Read slide 1 from lecture 8.

Implement the morphological rule which adds an 'iy' phoneme to end of a singular noun like "f ow t". Call it "oldplural".

We can now implement the "vowel harmony" phonological rule, but it's going to take a bit more work:

_______________________________________________________________________________

It might help to know that:

The round brackets are used for grouping, so '( ow )(t)( iy)' would match parts of a word which consist of THREE phonemes in a row, i.e. 'ow t iy'
When we use round brackets in this way to denote grouping, we need to "escape" them using a preceding backslash \, i.e. '$ ow $$t$$ iy$. This is so the sed command doesn't get confused between an actual ( and a ( used to do grouping.
The bit [a-z]* means "any phoneme at all", i.e. any sequence of letters.
The $ marks the very end of a word.
The \2 denotes the SECOND part of the previous pattern, in this case [a-z]*.

Can you apply the vowel harmony rule to the output of the old plural rule to turn the singular "f ow t" into the Old English plural "f ey t iy"?

_______________________________________________________________________________

If you have already completed the task, or if you are really (really, really) stuck, have a look to the Solutions

7. The Great Vowel Shift

Consider the following pairs of words as they are pronounced in Modern English:

      man - mane
      Sam - same
      rat - rate
      shack - shake
      bit - bite
      snip - snipe
      rid - ride
      met : mete
      pet : Pete

As any five year old will tell you, when you add "magic e" to the end of one of these words, the vowel sound changes: (a) from ae to ey; (b) from ih to ai; and (c) from eh to iy.

The magic e effect is the result of the Great Vowel Shift in Middle English. This resulted in lots of vowels being "raised" up in the mouth, depending on whether or not they were followed by an unstressed syllable.

So, "mane" used to be pronounced as "m ae n uh", but as a result of the GVS the final 'uh' was dropped, and the 'ae' raised to 'ey', giving the new pronunciation "m ey n".

Write a phonological rule, called 'gvs', which implements this part of the historical Great Vowel Shift. It should work as follows:

      echo "m ae n uh" | gvs
      m ey n
      echo "m ae n" | gvs
      m ae n
      echo "s n ih p uh" | gvs
      s n ai p
      echo "s n ih p" | gvs
      s n ih p
      echo "m eh t uh" | gvs
      m iy t
      echo "m eh t" | gvs
      m eh t

_______________________________________________________________________________

If you have already completed the task, or if you are really (really, really) stuck, have a look to the Solutions

Inf1 Cognitive Science: Week 4 Lab

1. The Arpabet phonetic alphabet

2. The sed command

3. Using sed to implement morphological rules

4. Using sed to implement phonological rules

5. Using sed to implement regular past tense inflection

6. Vowel harmony in Old English

7. The Great Vowel Shift

2. The `sed` command

3. Using `sed` to implement morphological rules

4. Using `sed` to implement phonological rules

5. Using `sed` to implement regular past tense inflection