Past tense experiments
Mark McConville
Henry S. Thompson
8 March 2012
1. Regular and irregular verbs in the Brown corpus
The ten most commonly occurring verbs in the million-word Brown Corpus:
- be (39,175)
- have (12,458)
- do (4,367)
- say (2,765)
- make (2,312)
- go (1,844)
- take (1,575)
- come (1,561)
- see (1,513)
- get (1,486)
Note that all of these are irregular
- The top four are even irregular in the present tense - is, has, does, says.
The first ten least commonly occurring verbs in the Brown Corpus:
- abate
- abbreviate
- abhor
- ablate
- abridge
- abrogate
- acclimatize
- acculturate
- admix
- adulterate
Note that all of these are regular.
Of the 877 verbs which occur just once in the Brown Corpus:
- 860 are regular
- 16 are prefixed irregulars (e.g. bethink, forswear, inbreed, misread, outfight)
- only one is a basic irregular verb - "smite".
Rarity appears to hurt irregular verbs, but not regular ones.
2. Memory and irregular verbs
Basic property of memory - the more often you hear something, the better you remember it
- Uncommon words have weak memory entries and are harder to retrieve.
Irregular verbs are the most common in English (as well as in other languages)
- They have to be re-memorised every generation
- The most commonly occurring verbs are the easiest to memorise.
- If an irregular verbs slips in popularity, it is likely
to be turned into a regular verb by a succeeding generation of
speakers.
Old English had three times as many strong verbs as Modern English
- e.g. abide-abode, chide-chid, cleave-clove, geld-gelt
Joan Bybee studied 33 Old English strong verbs that survive in Modern English:
- Those which are still irregular occur an average of 515 times in the Brown Corpus
- Those which have become regular occur an average of 21 times.
3. Rare irregulars, cont'd
Some rare irregular past tense forms are sliding out of English as we speak
- e.g. smite-smote, heave-hove, slay-slew, thrive-throve
- These past tense forms no longer sound as "natural" as they used to.
- But they are still string enough to block the regular past tense forms.
In many cases, the unnaturalness is relevant only to the past tense form, rather than to the verb itself
- e.g. "forwent" (cf. "forgoed") is much less natural than "forgo".
Irregular past tense verbs can part company from their stems, and accrue different degrees of familiarity
- This is what we would expect if they were stored as separate entries in memory.
4. Naturalness of past tense verb forms
Michael Ullman and Steven Pinker investigated the "gut
reactions" of 99 adult English speakers to different verbs, including
past tense forms.
Participants were asked to rate the naturalness
of different verb forms, on a scale of 1 (unnatural) to 7
(natural).
Verb stems and past tense forms were judged separately, to
distinguish between
- past tense forms which are intrinsically unnatural
- past tense forms which are unnatural because the verb
stem is itself unnatural.
For irregular past tense verb forms, the rating depended on
the frequency of the past tense forms themselves in the
language
- The more common the past tense form, the more
participants like it.
- The naturalness rating of the past tense form was less
dependent on the frequency of the relevant verb stem.
For regular past tense forms, the rating was independent of
the frequency of the past tense forms in the language
- A relatively rare form like "maimed" was just as natural
as a relatively common form like "walked".
These results support the hypothesis that the mind handles
regular and irregular past tense forms differently
- irregular past tense forms are memorised independently of the stem
- regular past tense forms are not.
5. Language production experiments
Sandeep Prasada, William Snyder and Steven Pinker investigated
how quickly English speakers could produce past tense forms.
Participants sat at a computer, had verb stems flashed at
them, and had to say the relevant past tense form as quickly as
they could.
A voice-operated trigger was used to time exactly how long it
took them to read the stem, mentally compute the past tense form,
and say it out loud.
- With irregular verbs, the time required depended on the
frequency of the past tense form (rather than the frequency of
the stem).
- More frequent irregular past tense forms (e.g. "rang")
were produced more quickly than less frequent ones
(e.g. "strove"), even when the verb stems
have equivalent frequency (as do "ring" and "strive").
- With regular verbs, there was no such correlation between
the frequency of the past tense form and the time taken to
produce it.
Again, these results support the hypothesis that irregular
past tense forms are stored in memory but regular past tense
forms aren't.
- More frequently occurring irregular past tense forms are
"stronger" in memory and hence are easier to retrieve.
These results have been replicated consistently by other teams
of researchers.
6. Word recognition experiments
Lexical decision tasks - participants see or hear
a sequence of real words and fake words (e.g. "narse" or "bluck")
and have to press one button for a real word and a different
button for a fake word.
- This standardised task allows psychologists to capture
the precise moment when participants recognise a
word.
- Or at least recognise that a word is actually a word.
Lexical decision tasks tell us something about how the mental
lexicon is organised.
Repetition priming - if particiants are given a
word, and then a short time later are given it again, they are
faster at recognising it second time around.
The priming effect also extends to related words,
e.g. from "doctor" to "nurse", or from "duck" to "goose".
- Words appear to be "hot-linked" in memory.
- When one word is "turned on", it becomes easier to turn on related words.
7. Priming word recognition
Robert Stanners investigated using past tense forms to prime verb stems
- Regular past tense forms are more effective at priming
their verb stems than are irregular past tense forms.
- In fact, regular past tense forms are as effective at
priming the verb stem as is the verb stem itself.
These results suggest that the lexical entry retrieved when
recognising a regular past tense verb form is the corresponding
verb stem entry itself.
On the other hand, the entry for an irregular past tense
form is separate from, but hot-linked to, that of its verb
stem.
Again these results have been reproduced many times, including
in experimental settings where brain activity is measured
directly, by electrodes pasted to the scalp.
Note that it is not the case that the priming effect is caused
by mere phonological overlap
- No priming effect was found between words like "market"
and "mark", or between "gravy" and "grave".
- Priming appears to rely on a combination of sound,
meaning and grammar, i.e. a lexical entry.
8. Cross-modal priming
Can we use the acoustic form of a word to prime its written
representation?
For example, experimental participants in lexical decision
tests hear some words and see others on
the computer screen.
These examples of cross-modal priming provide evidence that
priming occurs deep within the mind, rather at the shallow levels
of perception.
William Marslen-Wilson and Lorraine Tyler investigated
cross-modal priming with regular and irregular past tense verb
forms.
- Regular past tense verb forms (e.g. "asked") were found
to cross-modally prime the relevant verb stem
(i.e. "ask").
- Irregular past tense verb forms (e.g. "gave") were
significantly less effective at cross-modally priming their
verb stems.
These results were confirmed using subliminal priming
- Where the past tense verb form is flashed up on screen so
quickly that participants do not consciously recognise
it.
Marslen-Wilson and Tyler also showed that the associative link between an irregular past tense form (e.g. "gave") and its stem (i.e. "give") is stronger than those between semantically related words like "duck" and "goose"
- Semantic priming only works if the two words are presented in immediate succession.
- Grammatical priming can work over a longer period, for example several minutes.
9. Recap: the words-and-rules model
The simple words-and-rules model proposes that:
- Irregular past tense verb forms are stored as
words in the mental lexicon, independent of the
verb stems
- i.e. there are separate lexical entries for "swim"
and "swam", "bring" and "brought" etc.
- Regular past tense verbs forms are not stored as words,
but are formed by a productive rule
- i.e. there is
a lexical entry for the past tense suffix "-d", and a rule
which affixes it to any verb stem.
The blocking principle is used to resolve
potential conflicts
- If the speaker can retrieve a past tense form from
memory, then the application of the rule joining the "-d"
suffix to the verb stem is blocked.
This is a dual mechanism model of cognition,
since it presupposes two completely different kinds of "mental
tissue":
Together, the two mechanisms give rise to system which is both:
- expressive - every verb gets a past tense form
- efficient - the most common past tense forms are stored
as words and can thus be retrieved more quickly.
10. Recap: the words-and-rules model - evaluation
The words-and-rules model provides an explanation for the
amazing productivity of the regular past:
- Both children and adults will generally create a regular
past tense form for verbs whose past tense forms they have not
come across before.
However, the basic words-and-rules model cannot explain
observations relating to the patterns found among
the irregular verbs:
- People are slightly less happy forming regular past forms
from unknown verbs which are similar to lots of known irregular
verbs, e.g. "gling", "glend", "sprit", "queep", "brow"
- A few regular verbs have eventually turned irregular for
this reason, e.g. "ring", "dig", "quit".
The patterns found among the irregular verbs are not just of
etymological interest - they appear to be active (in some way)
inside the minds of present-day English speakers.
11. Recap: the SPE model
Chomsky & Halle present a single mechanism
model of past tense inflection:
- The patterns inherent in irregular past tense verb forms
are handled the same way as regular ones, i.e. using
rules.
- Essentially, there are no irregular past
tense forms.
However, SPE was never meant to be taken as a theory of how
linguistic knowledge is stored in the brain, put to use in
language production and understanding, or acquired by
children.
- SPE is not a theory of psycholinguistics.
- It is a theory of what we know, rather than
how we know it.
12. Recap: the connectionist model
Connectionist models (e.g. Rumelhart and MacLelland's) present
another single mechanism model of past tense
inflection.
Regular and irregular past tense morphology are again handled
in the same way
- But this time using a neural network (i.e. a
pattern association memory) to capture the mapping from stems
to past tense forms.
- The underlying assumption is that verbs which share more
phonological properties (i.e. sound similar) are more likely to
form the past tense in analogous ways.
Connectionist models are very good at learning the patterns
inherent in irregular past tense morphology.
And given just the right mixtures of regular and irregular
verbs during particular phases of training, they can be made to
mimick the U-shaped curve of child language acquisition.
- though not in a way that is sufficiently robust to
account for how children learn the difference between regular
and irregular verbs in languages with different proportions of
each.
13. Recap: the connectionist model - downsides
However, even the most sophisticated connectionist models of
English past tense inflection exhibit much lower accuracy with
regular verbs.
This is fundamentally because sound similarity is not an
important feature for regular verbs in English:
- All irregular families have regular interlopers:
- hit-hit, split-split, versus pit-pitted
- grow-grew, blow-blew, versus glow-glowed
- take-took, shake-shook, versus fake-faked
- Some irregular verbs even have homophonous regular verbs:
- fit-fit versus fit-fitted
- meet-met versus mete-meted
- lie-lay versus lie-lied
- Some regular verbs are so phonologically unlikely
(because they are derived from foreign loanwords), that a
pattern associator has absolutely no idea what to do with
them, unlike a human being:
- e.g. "Yeltsin out-Gorbachev'd Gorbachev."
- "We rhumba'd and chacha'd all night long."
No connectionist model has been able to successfully learn the
default nature of the regular past inflection.
- They are unable to generalise their training to words
that don't sound like any they have been trained on.
- For example, here is some typical output, when a trained
neural network is confronted with nonsense verbs:
- brilth-prevailed
- ploag-pleaded
- trilb-treelit
- smeej-leafloag
Pattern associator memories cannot exploit
variables - the basic gadget of computation.
- They cannot simply copy over the whole of a
stem and apply a suffix to it.
14. The augmented words-and-rules model
Pinker proposes an "augmented" version of the basic
words-and-rules model:
- Irregular past tense verb forms are still memorised as
separate words in the mental lexicon.
- Regular past tense verbs forms are still are formed by a
productive rule.
- The blocking principle is still used to resolve potential
conflicts.
But memory itself is not a list of unrelated slots (like
computer RAM).
- Memory is assumed to be associative.
In associative memory, words are linked to other, similar words
- semantic similarity, e.g. from "duck" to "goose"
- phonological similarity, e.g. from "blow" to "blend", from "blow" to "grow", etc.
In this kind of model, families of irregular verbs are easier
to store and retrieve, since these verbs repeatedly strengthen
their shared associations.
15. The augmented words-and-rules model (ctd.)
The augmented words-and-rules model combines the best bits of
all the previous models
- The semi-productive irregular verb patterns are handled
by the associative memory.
- The completely productive, default regular inflections
are handled by the rules.
Together these two mechanisms provide an explanation for all
the ways in which the mind appears to process irregular verb
inflection differently to regular verb
inflection.
- The naturalness of irregular past tense
forms is independent of the naturalness of the associated verb
stem, unlike the case with regular verbs.
- The time taken to produce an irregular past
tense form depends on its frequency in the language, unlike
with regular verbs.
- Regular past tense forms are better at
priming their stems than are irregular past tense
forms - even with cross-modal priming.
They also provide an explanation for the characteristic
U-shaped development when a child learns past tense
morphology:
- Stage 1: children learn past tense forms as independent
words.
- Stage 2: children have learned the regular past tense
rule, but lack sufficiently strong memory associations to block
it when needed.
- Stage 3: children get better at blocking as the memory
associations for irregular past tense forms get gradually
stronger, with repetition.
16. Neologisms
We've mentioned neologisms, i.e. new
words entering the language.
- Now we're going to look at them more closely
Six kinds of derived word can never have irregular inflected
forms, even if they resemble other irregular words
phonologically.
1. Onomatopoeic words, i.e. those which are
perceived to resemble sounds:
- ping-pinged, not ping-pang
- beep-beeped, not beep-bept
2. Quotations, i.e. "mentioned" words:
- "I found three 'man's on page 1."
- Not: "I found three 'men' on page 1."
3. Names, i.e. words derived from proper names:
- "Why aren't there more Michael Foots in the Labour
Party?" (not "Michael Feet")
- "Mae Jemison out-Sally-Rided Sally Ride." (not "out-Sally-Rode")
4. Foreign loanwords:
- deride-derided, not deride-derode
- succumb-succumbed, not succumb-succame
5. Abbreviations and truncations:
- synch-synched, not synch-sanch (short for "synchronise")
6. Derived words, i.e. converted from other parts-of-speech, e.g.:
- "Powell ringed the city with artillery" (not "rang")
- "I steeled myself for a visit to the doctor" (not "stole")
- "The batter flied out" (not "flew out")
17. A brief history of 'fly'
Pinker spends a lot of time discussing "fly" and "flied", so
it's worth a little bit of time picking it apart:
- The word "fly" stated out as a straightforward irregular
verb, meaning to "move through the air, without touching the
ground", i.e. "fly-flew".
- Baseball players and fans then used it as a
deverbal common noun in a compound with "ball", meaning the kind of shot that just goes
high up into the air - "Babe Ruth hit a fly ball".
- This is commonly shortened: "A-rod hits a long fly towards the
right-field line"
- And finally the deverbal noun "fly" was converted back
into a denominal verb meaning "to hit a fly", e.g. "Babe Ruth
flied out" - he hit a fly ball that was then caught by a
fielder.
18. Systematic regularisations
There are lots of other examples of irregular words that get
systematically regularised when used in certain ways:
- "All my daughter's friends are lowlifes", not "lowlives"
- "I'm sick of all the Mickey Mouses in this administration", not "Mickey Mice"
- "The Maple Leafs", not "Maple Leaves" (Toronto ice hockey team)
These can be explained through the interaction between words
and rules.
The regular inflection rules step in here, not because the
irregular forms cannot be retrieved from memory, but because the
derived words themselves are not stored in the normal, "canonical"
format.
19. Word structure theory
The systemic regularisations discussed above contrast with
other examples of derived verbs that do take
irregular past forms:
- "overeat"-"overate", not "overeated"
- "remake"-"remade", not "remaked"
- "preshrink"-"preshrank", not "preshrinked"
- "outfly"-"outflew", not "outflied"
What is the difference between these two kinds of word
formation?
Morphologists claim that a prefixed verb like "outfly" is
both:
- rooted, i.e. linked directly to the base
verb "fly", meaning "travel through the air, without touching
the ground"
- headed, i.e. the meaning of the prefixed
verb as a whole is a transparent combination of the meaning of
the prefix and the head.
In other words, "outflying" is a particular kind of "flying".
However, denominal verbs like "fly (out)" do not have these
two properties:
- "flying out" is not a kind of "flying", but rather a kind
of "hitting"
- There is no direct, semantically transparent link between
the meaning of the derived verb (i.e. to hit a ball in a
particular way), and the meaning of the basic root (i.e. to
travel in a particular way).
- i.e. there is no way to figure out what "flying out"
means, simply by considering its component parts.
The same thing goes for the other examples of systematic
regularisation:
- a lowlife is not a kind of life, but rather a kind of person
- a Mickey Mouse is not a kind of mouse, but again a kind of person
- the Maple Leafs are a collection of sportsmen, not a collection of leaves.
This explanation depends on having a distinction between words and rules:
- words have the property of being rooted or
not, i.e. depending on whether they are directly associated
with a canonical word in the lexicon.
- rules have the property of being headed or
not, depending on whether the meaning of the whole depends on
the meaning of the head component.
Lab experiments have shown that people do systematically
regularise brand-new denominal verbs they have never heard
before, even if they sound like normal irregular verbs:
- e.g. "John sinked the glasses" (i.e. put them in the sink)
But they don't regularise "semantically stretched" verbs in the same way:
- e.g. "John's hopes sank".
- "Not so much overlooked as underthought
20. Count nouns and mass nouns
English common nouns divide up into two main classes:
- mass nouns - denote "substances", e.g. "mud", "water", "celery", "furniture", "evidence"
- count nouns - denote "things", e.g. "goose", "chair", "tomato", "idea"
Count nouns all have plural forms, denoting a group of two or more of the relevant things
- "geese", "chairs", "tomatoes", "ideas"
Mass nouns do not have plural forms:
- *There are three evidences for this theory.
Caveat 1: mass nouns can often be repackaged as count nouns
- Tom drank three beers last night.
- Belgium has over 400 beers.
Caveat 2: count nouns can often be repackaged as mass nouns
- There was dog all over the road.
Caveat 3: a few plural nouns don't have singular base forms:
- "trousers", "scissors", "tights"
21. Regular plurals in English
Regular plurals in English are remarkably similar to regular past tense forms.
A single suffix morpheme is realised using three distinct, phonologically conditioned allomorphs:
- [əz] - after stems ending in a sibilant, e.g. "horses", "causes", "dishes", "stitches", "gorges"
- [s] - after stems ending in a voiceless (non-sibilant) consonant, e.g. "hawks", "bits", "hops"
- [z] - after stems ending in a voiced (non-sibilant) consonant or a vowel, e.g. "dogs", "sheds", "tubs", "trays"
These three allomorphs can be captured by the usual phonological rules of anaptyxis and devoicing:
- [hɔrs+z] => [hɔrsəz] (i.e. anaptyxis)
- [hɔ:k+z => [hɔ:ks] (i.e. devoicing)
- [dɔ:g+z] => [dɔ:gz]
22. Irregular plurals in English
Seven commonly used English nouns form their plural by changing the internal vowel
- "men", "women", "feet", "geese", "teeth", "mice", "lice"
Three nouns have kept the Anglo-Saxon plural suffix -en:
Some nouns denoting "gregarious animals that are hunted, gathered or farmed" are identical in the singular and plural
- "fish", "salmon", "deer", "sheep", "grouse", "quail"
- Other languages have proper plural forms for these nouns, e.g. "moutons", "poissons".
Some nouns voice the final [f], [θ] or [s] consonant of the stem, before adding the plural suffix:
- "calves", "elves", "dwarves", "knives", "wives", "mouths", "youths", "houses"
- but not: "beliefs", "briefs", "spoofs", "births", "earths", "months"
Some "academic" nouns borrowed from Latin keep their original plural forms:
- -us/-i: "alumni", "cacti", "fungi", "foci", "nuclei", "stimuli"
- -us/-era or -us/-ora: "genera", "corpora"
- -a/-ae: "algae", "antennae", "formulae", "vertebrae"
- -um/-a: "addenda", "bacteria", "data", "strata", "millenia"
- -ex/-ices: "indices", "appendices", "matrices", "vortices"
As do some borrowed from Greek:
- -is/-es: "analyses", "axes", "diagnoses", "hypotheses", "theses"
- -on/-a: "criteria", "phenomena", "ganglia", "automata"
But many other Latin and Greek nouns take normal regular plurals:
- "bonuses", "campuses", "circuses", "sinuses", "choruses"
- "areas", "arena", "dilemmas", "diplomas", "dramas", "eras"
- "albums", "aquariums", "forums", "museums"