Inf1 CogSci 2012: Lecture 19: Artificial neural networks, child development and the 'U-shaped curve'

1. Parallel distributed processing

Starting in the 1980s, Rumelhart and McClelland promoted the multilayer feed-forward perceptron

as the basis an architecture for cognitive modelling
under the name Parallel Distributed Processing
or PDP for short

One of their early targets was. . .

Regular and irregular past tenses in English!

And they didn't even use hidden units

At least, not at first

2. McClelland and Rumelhart and English past tense verbs

Their model was pretty radical

no lexicon of words
no rules
Just a two-level fully-connected feed-forward percpetron network

The input to the network is the verb's base form, e.g. [dans], [sInk]

The output from the network should be the past tense form, i.e. [danst], [sank]

3. McClelland and Rumelhart: the details

The design of the input and output was crucial

M&R followed Chomsky and Halle, and used features

Triples of sets of features, in fact
Known as Wickelfeatures
- See Words and Rules for a discussion of where these came from
So, covering three phonemes in a row
For example St-Hv-St
- For stop+high vowel+stop
or N-St-]
- For nasal+stop+word-final

Both input and output were labelled with 460 different such triples

211600 connections (and weights)

They trained it with 420 input/output pairs

over 200 iterations

4. And the result was. . .

After the 84,000 training iterations, the settled network worked well for almost all the 420 verbs in the training set

It performed adequately for a separate test set of 86 other verbs

3/4 of regular verb stems were assigned the correct past tense form
Most irregular verbs stems were assigned overgeneralised regular past tense forms (e.g. "digged", "catched")

During training, the network behaved in an impressively childlike manner

After a period of outputting "gave" correctly, it shifted to the incorrect "gived"
It learned to be reluctant to stick -ed on a verb stem ending in [t] or [d]
It made lots of characteristic childlike errors, e.g. cling/clang, sip/sept
In other words, M&R appears to capture stem-stem similarity

More after we recapitulate the human data

5. Overregularisations

Young children make errors with past tense forms.

In particular, they overregularise

e.g. singed, bleeded

Children appear not only to learn rules, but also to overapply them.

However, the patterns of error that children make with past tense forms is actually highly complex

the appearance of simplicity is deceptive
any model that attempts to give an explanation needs to focus on the details.

We now shift from static language description to development:

how do children learn regular and irregular past tense verb forms

6. Stages of language acquisition

Around 18 months, children start to produce two-word microsentences

e.g. "See baby!", "More cereal!"

Some of these are original productions, rather than telegraphic renditions of their parents' speech, e.g.

"Allgone sticky!" (i.e. "My hands are clean")
"Circle toast!" (i.e. "I want a bagel")

Around 2 years, children start producing longer, more complicated sentences.

Simultaneously, they start to use grammatical morphemes

inflectional suffixes, e.g. -ed, -s, -ing
auxiliary verbs, e.g. have, be, do, will

7. Stages of language acquisition (ctd.)

Around 3 years, children start to make errors, by attaching the past tense suffix -ed to irregular verb stems (e.g. "singed", "bleeded").

Around the same time, they start to pass the "wug" test, in experimental settings:

e.g. rick - ricked, bing - binged

Children don't just make errors adding -ed to irregular verb stems

they attach it to irregular past tense forms, e.g. "broked", "ated"
they attach it to their own neologisms, e.g. "pooked", "lightninged", "spidered"
they attach it to past tense forms that already have a suffix, e.g. "pukeded", "presseded"

8. Overzealous grammarians

Children don't just overgeneralise from regular past tense forms

they overuse the plural suffix -s, e.g. "mans", "foots", "tooths", "mouses"
they overuse the regular third person singulr suffix -s, e.g. "haves", "do's", "be's"
they overuse the comparative and superlative suffixes -er and -est, e.g. "specialer", "powerfullest", "gooder"
they overuse the regular ordinal suffix -th on numerals, e.g. "oneth", "twoth", "threeth"

Children are overzealous grammarians, finding regularity in the oddest places

Parent: No booze in the house!
Child: What's a "boo"?

Kid: "I don't care who the Yankees are going to verse!"
[From "the Yankees versus the Red Sox"]

Note that adults occasionally do the same thing

e.g. misanalysing "kudos" as a plural noun.

9. U-shaped development

With almost everything, children's performance gets better as they get older.

However, inflectional morphology appears to be different

children appear to get worse before getting better.

This is what child psychologists call U-shaped development.

Stage 1: children produce both regular and irregular past tense forms with very few errors.

Stage 2: after a certain amount of time, the error rate appears to increase significantly

children start to make more and more errors, adding the regular past tense suffix -ed to irregular verb stems
even with verbs whose past tense forms they had previously mastered.

Stage 3: the error rate slowly decreases, as the child get older, until almost no errors are made.

U-shaped development fascinates child psychologists

the sudden deterioration in performance appears to be evidence for mental reorganisation
the child has inferred a new generalisation involving previously unrelated concepts
e.g. that the -ed suffix on a verb signifies "pastness".

10. Children versus adults

Stage 2 children and adults both have internalised the rule which says "add -ed to form the past tense"

but only children make overregularisation errors like "bleeded" and "singed".

Therefore:

Adults must have something that stage 2 children lack.
Children must gradually attain whatever that thing is, as they grow older.

First guess:

Adults are able to communicate their thoughts more clearly than children.
Children improve their past tense verb performance by slowly learning to communicate more clearly.

This theory is clearly wrong:

There is nothing intrinsically unclear about overregularised forms like "bleeded" and "singed"
Many correct past tense forms are identical to the verb stem, and hence are less clear than the overregularised form, e.g. "hit", "cut", "put", "set"

Second guess:

Adults don't say "bleeded" and "singed" because they don't hear other adults saying these words.

But: adults are not just parrots

they are capable of producing regular past tense forms they have never heard before (e.g. "moshed", "Borked", "faxed")
so there is no reason why they can't do the same with "bleeded" or "singed".

Third guess: Adults have learned the blocking principle

The memorised "sang" blocks the past-tense suffixation rule from applying to "sing"

The challenge for any cognitive science model of language processing is to actually demonstrate that it in some way predicts blocking

11. Blocking

Why don't adults overgeneralise the -ed suffix to irregular verb stems?

not because they haven't heard the overregularised forms (i.e. "bleeded" or "singed")
but because they have heard the irregular forms (i.e. "bled" and "sang").

Some component of adult psychology means that the experience of having heard the irregular forms "bled" and "sang" inhibits the application of the -ed rule to create "bleeded" and "singed".

Previously, we called this "blocking". - the existence of "bled" and "sang" in the mental lexicon blocks the production of "bleeded" and "singed".

Two competing hypotheses:

Children lack the blocking mechanism and have to learn it.
Children have the blocking principle but lack the relevant experience needed to use it effectively.

12. Learning how to block

How could a child learn the blocking principle from scratch?

They would need to learn explicitly that overregularised forms like "bleeded" and "singed" are ungrammatical.

it is not enough that they haven't heard these forms
they need to have negative evidence to solve the problem
i.e. information about what is not in the language.

In other words, children would need to receive negative feedback from somewhere whenever they used an incorrect past tense form

an explicit correction
or some indirect signal of disapproval (a frown, a puzzled look, a slap)
A failure to achieve some non-linguistic goal

However, there is no evidence that negative feedback has any effect on children's language acquisition.

The child psychologist Karin Stromswold has a dramatic example of this

a boy who has never talked (for some unknown neurological reason)
but who has nonetheless learned the blocking principle for past tense verbs forms
this child cannot have learned this based on parental correction to his own errors.

13. Blocking as innate knowledge

Maybe the blocking principle is part of innate linguistic knowledge

so children do not need to learn it from scratch.

Children don't learn the blocking principle from evidence that "singed" is not in English

rather they deduce that "singed" is not in English from the blocking principle.

Why do adults use blocking more effectively than children are able to?

Because adults have more experience than children.
In particular, they have heard irregular past tense verb forms being used more often than children have.
And memory retrieval improves through repetition.

Stage 2 children and adults are actually very similar

they both have access to the innate blocking principle
they both have irregular verb forms stored in the mental lexicon

The difference is that adults can retrieve the irregular verb forms from memory more quickly, and hence blocking is more likely to happen.

In other words, children are "little adults with bad memories".

14. Little adults with bad memories?

Analysis of spontaneous conversations between children and their parents supports the "little adults with bad memories" theory.

Stage 2 children's average error rate for past tense verb forms is actually surprisingly low

only 4% of past tense forms produced by a child are incorrect
adults tend to overestimate the error rate, since errors are more noteworthy than correct forms.

No irregular verb is immune to overregularisation errors

even verbs whose past tense form the child has already produced correctly

Similarly, no verb is consistently erred on.

This suggests that children are not ignorant of the correct irregular past tense forms

they are just fallible when it comes to retrieving them.

The key factor appears to be parental input

the more the parents use an irregular past tense verb form, the less likely the child is to overregularise the verb.

Children appear to know that their errors are errors

they will correct an adult who uses an incorrect form, even if they have just used that form themselves.

So, children appear to know the correct irregular past tense forms

errors are runtime slip-ups caused by the fact that the correct form cannot always be retrieved in real time.

15. Rule learning

What triggers the mental transition from stage 1 to stage 2?

i.e. from making no over-regularisation errors to making some.

Simplest theory - this is the point at which the child has learned the relevant rule.

Evidence shows that stage 1 children often use the base form of a verb for both the present and past tense

e.g. "Yesterday we walk"
the transition from using the stem to using the regular past tense form (i.e. "Yesterday we walked") coincides with the start of over-irregularisation.

Also, the data suggests that, during the transition to stage 2, it is not necessarily the case that over-regularisations (e.g. "singed") are driving out correct forms (i.e "sang")

rather errors of commission (e.g. "Yesterday we singed") are, as often as not, driving out errors of omission (i.e. "Yesterday we sing").
so focusing on the rate of over-regularisations by itself may actually be misleading (rather than say the combined rate of errors of omission and commission).

Finally, evidence from language development in identical and fraternal twins suggests that the timing of the transition from stage 1 to stage 2 is simply a matter of chance

when one twin starts to over-regularise, an identical twin is no more likely to follow suit than a fraternal twin.

16. Pattern association memory

Maybe a rule is not necessary to explain over-regularisations in children's acquisition of inflectional morphology

maybe they simply analogise from verbs they already know
e.g. from correct forms like "folded", "molded", "scolded" to over-regularisations like "holded".

This is the basis of Rumelhart and MacLelland's pattern association memory model which we discussed last week

the network learned hundreds of regular and irregular past tense verb forms, using a simple two-layer perceptron network
it seemed to follow the U-shaped curve during training.

How did R&M get their neural network to learn in such an authentically child-like manner?

17. Discontinuous training

R&M made use of a couple of empirical observations:

children learn common verbs first, and rarer verbs later
i.e. they tend to learn irregular verbs before regular ones
children's vocabulary grows very quickly all of a sudden, a few months after they start learning words, i.e. at some point they get a huge spurt of regular verbs.

With this in mind, R&M trained their network in a slightly extreme way:

they first trained it on just 10 verbs, all at once, 8 of which were irregular
they then trained it on a further 410 verbs, again all at once, 80% of which were regular.

Since pattern association memories are highly sensitive to changes in the statistics of their input, it is not surprising that errors rates increase dramatically at the start of the second training phase, before recovering gradually.

18. Vocabulary expansion as trigger

There is one key question that we need to pose in order to evaluate the R&M approach to learning:

Do children begin to over-generalise in response to a sudden influx of regular verbs?

Data from spontaneous conversations involving children shows no evidence for this:

a child's vocabulary appears to spurt about a year too soon to account for the onset of over-regularisation
i.e. mid-to-late ones versus mid-to-late twos
when children start to make over-regularisation errors, new regular verbs are actually coming in more slowly than they were previously.

R&M's training regime is extremely fragile

it requires a very particular combination of input factors to mimic the child-like learning curve.

In reality, language learning is a fairly robust process

children exhibit very similar learning curves, even with vastly different patterns of data to learn from
for example, children appear to learn English plural inflection in a very similar way to past tense inflection, despite the relative proportions of irregular to regular forms being vastly different in the two cases.