Past tense experiments

1. Regular and irregular verbs in the Brown corpus

The ten most commonly occurring verbs in the million-word Brown Corpus:

be (39,175)
have (12,458)
do (4,367)
say (2,765)
make (2,312)
go (1,844)
take (1,575)
come (1,561)
see (1,513)
get (1,486)

Note that all of these are irregular

The top four are even irregular in the present tense - is, has, does, says.

The first ten least commonly occurring verbs in the Brown Corpus:

abate
abbreviate
abhor
ablate
abridge
abrogate
acclimatize
acculturate
admix
adulterate

Note that all of these are regular.

Of the 877 verbs which occur just once in the Brown Corpus:

860 are regular
16 are prefixed irregulars (e.g. bethink, forswear, inbreed, misread, outfight)
only one is a basic irregular verb - "smite".

Rarity appears to hurt irregular verbs, but not regular ones.

2. Memory and irregular verbs

Basic property of memory - the more often you hear something, the better you remember it

Uncommon words have weak memory entries and are harder to retrieve.

Irregular verbs are the most common in English (as well as in other languages)

They have to be re-memorised every generation
The most commonly occurring verbs are the easiest to memorise.
If an irregular verbs slips in popularity, it is likely to be turned into a regular verb by a succeeding generation of speakers.

Old English had three times as many strong verbs as Modern English

e.g. abide-abode, chide-chid, cleave-clove, geld-gelt

Joan Bybee studied 33 Old English strong verbs that survive in Modern English:

Those which are still irregular occur an average of 515 times in the Brown Corpus
Those which have become regular occur an average of 21 times.

3. Rare irregulars, cont'd

Some rare irregular past tense forms are sliding out of English as we speak

e.g. smite-smote, heave-hove, slay-slew, thrive-throve
These past tense forms no longer sound as "natural" as they used to.
But they are still string enough to block the regular past tense forms.

In many cases, the unnaturalness is relevant only to the past tense form, rather than to the verb itself

e.g. "forwent" (cf. "forgoed") is much less natural than "forgo".

Irregular past tense verbs can part company from their stems, and accrue different degrees of familiarity

This is what we would expect if they were stored as separate entries in memory.

4. Naturalness of past tense verb forms

Michael Ullman and Steven Pinker investigated the "gut reactions" of 99 adult English speakers to different verbs, including past tense forms.

Participants were asked to rate the naturalness of different verb forms, on a scale of 1 (unnatural) to 7 (natural).

Verb stems and past tense forms were judged separately, to distinguish between

past tense forms which are intrinsically unnatural
past tense forms which are unnatural because the verb stem is itself unnatural.

For irregular past tense verb forms, the rating depended on the frequency of the past tense forms themselves in the language

The more common the past tense form, the more participants like it.
The naturalness rating of the past tense form was less dependent on the frequency of the relevant verb stem.

For regular past tense forms, the rating was independent of the frequency of the past tense forms in the language

A relatively rare form like "maimed" was just as natural as a relatively common form like "walked".

These results support the hypothesis that the mind handles regular and irregular past tense forms differently

irregular past tense forms are memorised independently of the stem
regular past tense forms are not.

5. Language production experiments

Sandeep Prasada, William Snyder and Steven Pinker investigated how quickly English speakers could produce past tense forms.

Participants sat at a computer, had verb stems flashed at them, and had to say the relevant past tense form as quickly as they could.

A voice-operated trigger was used to time exactly how long it took them to read the stem, mentally compute the past tense form, and say it out loud.

With irregular verbs, the time required depended on the frequency of the past tense form (rather than the frequency of the stem).
More frequent irregular past tense forms (e.g. "rang") were produced more quickly than less frequent ones (e.g. "strove"), even when the verb stems have equivalent frequency (as do "ring" and "strive").
With regular verbs, there was no such correlation between the frequency of the past tense form and the time taken to produce it.

Again, these results support the hypothesis that irregular past tense forms are stored in memory but regular past tense forms aren't.

More frequently occurring irregular past tense forms are "stronger" in memory and hence are easier to retrieve.

These results have been replicated consistently by other teams of researchers.

6. Word recognition experiments

Lexical decision tasks - participants see or hear a sequence of real words and fake words (e.g. "narse" or "bluck") and have to press one button for a real word and a different button for a fake word.

This standardised task allows psychologists to capture the precise moment when participants recognise a word.
Or at least recognise that a word is actually a word.

Lexical decision tasks tell us something about how the mental lexicon is organised.

Repetition priming - if particiants are given a word, and then a short time later are given it again, they are faster at recognising it second time around.

The priming effect also extends to related words, e.g. from "doctor" to "nurse", or from "duck" to "goose".

Words appear to be "hot-linked" in memory.
When one word is "turned on", it becomes easier to turn on related words.

7. Priming word recognition

Robert Stanners investigated using past tense forms to prime verb stems

Regular past tense forms are more effective at priming their verb stems than are irregular past tense forms.
In fact, regular past tense forms are as effective at priming the verb stem as is the verb stem itself.

These results suggest that the lexical entry retrieved when recognising a regular past tense verb form is the corresponding verb stem entry itself.

On the other hand, the entry for an irregular past tense form is separate from, but hot-linked to, that of its verb stem.

Again these results have been reproduced many times, including in experimental settings where brain activity is measured directly, by electrodes pasted to the scalp.

Note that it is not the case that the priming effect is caused by mere phonological overlap

No priming effect was found between words like "market" and "mark", or between "gravy" and "grave".
Priming appears to rely on a combination of sound, meaning and grammar, i.e. a lexical entry.

8. Cross-modal priming

Can we use the acoustic form of a word to prime its written representation?

For example, experimental participants in lexical decision tests hear some words and see others on the computer screen.

These examples of cross-modal priming provide evidence that priming occurs deep within the mind, rather at the shallow levels of perception.

William Marslen-Wilson and Lorraine Tyler investigated cross-modal priming with regular and irregular past tense verb forms.

Regular past tense verb forms (e.g. "asked") were found to cross-modally prime the relevant verb stem (i.e. "ask").
Irregular past tense verb forms (e.g. "gave") were significantly less effective at cross-modally priming their verb stems.

These results were confirmed using subliminal priming

Where the past tense verb form is flashed up on screen so quickly that participants do not consciously recognise it.

Marslen-Wilson and Tyler also showed that the associative link between an irregular past tense form (e.g. "gave") and its stem (i.e. "give") is stronger than those between semantically related words like "duck" and "goose"

Semantic priming only works if the two words are presented in immediate succession.
Grammatical priming can work over a longer period, for example several minutes.

9. Recap: the words-and-rules model

The simple words-and-rules model proposes that:

Irregular past tense verb forms are stored as words in the mental lexicon, independent of the verb stems
- i.e. there are separate lexical entries for "swim" and "swam", "bring" and "brought" etc.
Regular past tense verbs forms are not stored as words, but are formed by a productive rule
- i.e. there is a lexical entry for the past tense suffix "-d", and a rule which affixes it to any verb stem.

The blocking principle is used to resolve potential conflicts

If the speaker can retrieve a past tense form from memory, then the application of the rule joining the "-d" suffix to the verb stem is blocked.

This is a dual mechanism model of cognition, since it presupposes two completely different kinds of "mental tissue":

memory
computation

Together, the two mechanisms give rise to system which is both:

expressive - every verb gets a past tense form
efficient - the most common past tense forms are stored as words and can thus be retrieved more quickly.

10. Recap: the words-and-rules model - evaluation

The words-and-rules model provides an explanation for the amazing productivity of the regular past:

Both children and adults will generally create a regular past tense form for verbs whose past tense forms they have not come across before.

However, the basic words-and-rules model cannot explain observations relating to the patterns found among the irregular verbs:

People are slightly less happy forming regular past forms from unknown verbs which are similar to lots of known irregular verbs, e.g. "gling", "glend", "sprit", "queep", "brow"
A few regular verbs have eventually turned irregular for this reason, e.g. "ring", "dig", "quit".

The patterns found among the irregular verbs are not just of etymological interest - they appear to be active (in some way) inside the minds of present-day English speakers.

11. Recap: the SPE model

Chomsky & Halle present a single mechanism model of past tense inflection:

The patterns inherent in irregular past tense verb forms are handled the same way as regular ones, i.e. using rules.
Essentially, there are no irregular past tense forms.

However, SPE was never meant to be taken as a theory of how linguistic knowledge is stored in the brain, put to use in language production and understanding, or acquired by children.

SPE is not a theory of psycholinguistics.
It is a theory of what we know, rather than how we know it.

12. Recap: the connectionist model

Connectionist models (e.g. Rumelhart and MacLelland's) present another single mechanism model of past tense inflection.

Regular and irregular past tense morphology are again handled in the same way

But this time using a neural network (i.e. a pattern association memory) to capture the mapping from stems to past tense forms.
The underlying assumption is that verbs which share more phonological properties (i.e. sound similar) are more likely to form the past tense in analogous ways.

Connectionist models are very good at learning the patterns inherent in irregular past tense morphology.

And given just the right mixtures of regular and irregular verbs during particular phases of training, they can be made to mimick the U-shaped curve of child language acquisition.

though not in a way that is sufficiently robust to account for how children learn the difference between regular and irregular verbs in languages with different proportions of each.

13. Recap: the connectionist model - downsides

However, even the most sophisticated connectionist models of English past tense inflection exhibit much lower accuracy with regular verbs.

This is fundamentally because sound similarity is not an important feature for regular verbs in English:

All irregular families have regular interlopers:
- hit-hit, split-split, versus pit-pitted
- grow-grew, blow-blew, versus glow-glowed
- take-took, shake-shook, versus fake-faked
Some irregular verbs even have homophonous regular verbs:
- fit-fit versus fit-fitted
- meet-met versus mete-meted
- lie-lay versus lie-lied
Some regular verbs are so phonologically unlikely (because they are derived from foreign loanwords), that a pattern associator has absolutely no idea what to do with them, unlike a human being:
- e.g. "Yeltsin out-Gorbachev'd Gorbachev."
- "We rhumba'd and chacha'd all night long."

No connectionist model has been able to successfully learn the default nature of the regular past inflection.

They are unable to generalise their training to words that don't sound like any they have been trained on.
For example, here is some typical output, when a trained neural network is confronted with nonsense verbs:
- brilth-prevailed
- ploag-pleaded
- trilb-treelit
- smeej-leafloag

Pattern associator memories cannot exploit variables - the basic gadget of computation.

They cannot simply copy over the whole of a stem and apply a suffix to it.

14. The augmented words-and-rules model

Pinker proposes an "augmented" version of the basic words-and-rules model:

Irregular past tense verb forms are still memorised as separate words in the mental lexicon.
Regular past tense verbs forms are still are formed by a productive rule.
The blocking principle is still used to resolve potential conflicts.

But memory itself is not a list of unrelated slots (like computer RAM).

Memory is assumed to be associative.

In associative memory, words are linked to other, similar words

semantic similarity, e.g. from "duck" to "goose"
phonological similarity, e.g. from "blow" to "blend", from "blow" to "grow", etc.

In this kind of model, families of irregular verbs are easier to store and retrieve, since these verbs repeatedly strengthen their shared associations.

15. The augmented words-and-rules model (ctd.)

The augmented words-and-rules model combines the best bits of all the previous models

The semi-productive irregular verb patterns are handled by the associative memory.
The completely productive, default regular inflections are handled by the rules.

Together these two mechanisms provide an explanation for all the ways in which the mind appears to process irregular verb inflection differently to regular verb inflection.

The naturalness of irregular past tense forms is independent of the naturalness of the associated verb stem, unlike the case with regular verbs.
The time taken to produce an irregular past tense form depends on its frequency in the language, unlike with regular verbs.
Regular past tense forms are better at priming their stems than are irregular past tense forms - even with cross-modal priming.

They also provide an explanation for the characteristic U-shaped development when a child learns past tense morphology:

Stage 1: children learn past tense forms as independent words.
Stage 2: children have learned the regular past tense rule, but lack sufficiently strong memory associations to block it when needed.
Stage 3: children get better at blocking as the memory associations for irregular past tense forms get gradually stronger, with repetition.

16. Neologisms

We've mentioned neologisms, i.e. new words entering the language.

Now we're going to look at them more closely

Six kinds of derived word can never have irregular inflected forms, even if they resemble other irregular words phonologically.

1. Onomatopoeic words, i.e. those which are perceived to resemble sounds:

ping-pinged, not ping-pang
beep-beeped, not beep-bept

2. Quotations, i.e. "mentioned" words:

"I found three 'man's on page 1."
Not: "I found three 'men' on page 1."

3. Names, i.e. words derived from proper names:

"Why aren't there more Michael Foots in the Labour Party?" (not "Michael Feet")
"Mae Jemison out-Sally-Rided Sally Ride." (not "out-Sally-Rode")

4. Foreign loanwords:

deride-derided, not deride-derode
succumb-succumbed, not succumb-succame

5. Abbreviations and truncations:

synch-synched, not synch-sanch (short for "synchronise")

6. Derived words, i.e. converted from other parts-of-speech, e.g.:

"Powell ringed the city with artillery" (not "rang")
"I steeled myself for a visit to the doctor" (not "stole")
"The batter flied out" (not "flew out")

17. A brief history of 'fly'

Pinker spends a lot of time discussing "fly" and "flied", so it's worth a little bit of time picking it apart:

The word "fly" stated out as a straightforward irregular verb, meaning to "move through the air, without touching the ground", i.e. "fly-flew".
Baseball players and fans then used it as a deverbal common noun in a compound with "ball", meaning the kind of shot that just goes high up into the air - "Babe Ruth hit a fly ball".
This is commonly shortened: "A-rod hits a long fly towards the right-field line"
And finally the deverbal noun "fly" was converted back into a denominal verb meaning "to hit a fly", e.g. "Babe Ruth flied out" - he hit a fly ball that was then caught by a fielder.

18. Systematic regularisations

There are lots of other examples of irregular words that get systematically regularised when used in certain ways:

"All my daughter's friends are lowlifes", not "lowlives"
"I'm sick of all the Mickey Mouses in this administration", not "Mickey Mice"
"The Maple Leafs", not "Maple Leaves" (Toronto ice hockey team)

These can be explained through the interaction between words and rules.

The regular inflection rules step in here, not because the irregular forms cannot be retrieved from memory, but because the derived words themselves are not stored in the normal, "canonical" format.

19. Word structure theory

The systemic regularisations discussed above contrast with other examples of derived verbs that do take irregular past forms:

"overeat"-"overate", not "overeated"
"remake"-"remade", not "remaked"
"preshrink"-"preshrank", not "preshrinked"
"outfly"-"outflew", not "outflied"

What is the difference between these two kinds of word formation?

Morphologists claim that a prefixed verb like "outfly" is both:

rooted, i.e. linked directly to the base verb "fly", meaning "travel through the air, without touching the ground"
headed, i.e. the meaning of the prefixed verb as a whole is a transparent combination of the meaning of the prefix and the head.

In other words, "outflying" is a particular kind of "flying".

However, denominal verbs like "fly (out)" do not have these two properties:

"flying out" is not a kind of "flying", but rather a kind of "hitting"
There is no direct, semantically transparent link between the meaning of the derived verb (i.e. to hit a ball in a particular way), and the meaning of the basic root (i.e. to travel in a particular way).
i.e. there is no way to figure out what "flying out" means, simply by considering its component parts.

The same thing goes for the other examples of systematic regularisation:

a lowlife is not a kind of life, but rather a kind of person
a Mickey Mouse is not a kind of mouse, but again a kind of person
the Maple Leafs are a collection of sportsmen, not a collection of leaves.

This explanation depends on having a distinction between words and rules:

words have the property of being rooted or not, i.e. depending on whether they are directly associated with a canonical word in the lexicon.
rules have the property of being headed or not, depending on whether the meaning of the whole depends on the meaning of the head component.

Lab experiments have shown that people do systematically regularise brand-new denominal verbs they have never heard before, even if they sound like normal irregular verbs:

e.g. "John sinked the glasses" (i.e. put them in the sink)

But they don't regularise "semantically stretched" verbs in the same way:

e.g. "John's hopes sank".
"Not so much overlooked as underthought

20. Count nouns and mass nouns

English common nouns divide up into two main classes:

mass nouns - denote "substances", e.g. "mud", "water", "celery", "furniture", "evidence"
count nouns - denote "things", e.g. "goose", "chair", "tomato", "idea"

Count nouns all have plural forms, denoting a group of two or more of the relevant things

"geese", "chairs", "tomatoes", "ideas"

Mass nouns do not have plural forms:

*There are three evidences for this theory.

Caveat 1: mass nouns can often be repackaged as count nouns

Tom drank three beers last night.
Belgium has over 400 beers.

Caveat 2: count nouns can often be repackaged as mass nouns

There was dog all over the road.

Caveat 3: a few plural nouns don't have singular base forms:

"trousers", "scissors", "tights"

21. Regular plurals in English

Regular plurals in English are remarkably similar to regular past tense forms.

A single suffix morpheme is realised using three distinct, phonologically conditioned allomorphs:

[əz] - after stems ending in a sibilant, e.g. "horses", "causes", "dishes", "stitches", "gorges"
[s] - after stems ending in a voiceless (non-sibilant) consonant, e.g. "hawks", "bits", "hops"
[z] - after stems ending in a voiced (non-sibilant) consonant or a vowel, e.g. "dogs", "sheds", "tubs", "trays"

These three allomorphs can be captured by the usual phonological rules of anaptyxis and devoicing:

[hɔrs+z] => [hɔrsəz] (i.e. anaptyxis)
[hɔ:k+z => [hɔ:ks] (i.e. devoicing)
[dɔ:g+z] => [dɔ:gz]

22. Irregular plurals in English

Seven commonly used English nouns form their plural by changing the internal vowel

"men", "women", "feet", "geese", "teeth", "mice", "lice"

Three nouns have kept the Anglo-Saxon plural suffix -en:

children, oxen, brethren

Some nouns denoting "gregarious animals that are hunted, gathered or farmed" are identical in the singular and plural

"fish", "salmon", "deer", "sheep", "grouse", "quail"
Other languages have proper plural forms for these nouns, e.g. "moutons", "poissons".

Some nouns voice the final [f], [θ] or [s] consonant of the stem, before adding the plural suffix:

"calves", "elves", "dwarves", "knives", "wives", "mouths", "youths", "houses"
but not: "beliefs", "briefs", "spoofs", "births", "earths", "months"

Some "academic" nouns borrowed from Latin keep their original plural forms:

-us/-i: "alumni", "cacti", "fungi", "foci", "nuclei", "stimuli"
-us/-era or -us/-ora: "genera", "corpora"
-a/-ae: "algae", "antennae", "formulae", "vertebrae"
-um/-a: "addenda", "bacteria", "data", "strata", "millenia"
-ex/-ices: "indices", "appendices", "matrices", "vortices"

As do some borrowed from Greek:

-is/-es: "analyses", "axes", "diagnoses", "hypotheses", "theses"
-on/-a: "criteria", "phenomena", "ganglia", "automata"

But many other Latin and Greek nouns take normal regular plurals:

"bonuses", "campuses", "circuses", "sinuses", "choruses"
"areas", "arena", "dilemmas", "diplomas", "dramas", "eras"
"albums", "aquariums", "forums", "museums"