InputOutput 0 00 1 00 0 10 1 11

Input Output

0 0 01 0 00 1 01 1 1

AND Network

NETWORK CONFIGURED BY TLEARN# weights after 10000 sweeps# WEIGHTS# TO NODE 1-1.9083807468 ## bias to 14.3717832565 ## i1 to 14.3582129478 ## i2 to 10.0000000000

OR Network

Input Output

0 0 01 0 10 1 11 1 1

XOR Network

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

XOR Network

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Input Output

0 0 01 0 00 1 11 1 0

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2

-4.4429202080 ## bias to output9.0652370453 ## 1 to output8.9045801163 ## 2 to output

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Input Output

0 0 01 0 00 1 11 1 0

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2


XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Input Output

0 0 01 0 00 1 11 1 0

Input Output

0 0 01 0 10 1 11 1 1

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2


XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Input Output

0 0 01 0 00 1 11 1 0

Input Output

0 0 01 0 10 1 11 1 1

The mapping from the hidden units to output is an OR network, that never receives a [1 1] input.

The Past Tense and Beyond

Classic Developmental Story

• Initial mastery of regular and irregular past tense forms

• Overregularization appears only later (e.g. goed, comed)

• ‘U-Shaped’ developmental pattern taken as evidence for learning of a morphological rule

V + [+past] --> stem + /d/

Rumelhart & McClelland 1986

Model learns to classify regulars and irregulars,based on sound similarity alone.Shows U-shaped developmental profile.

What is really at stake here?

• Abstraction

• Operations over variables

• Learning based on input

What is not at stake here

• Feedback, negative evidence, etc.

Who has the most at stake here?

• Those who deny the need for rules/variables in language have the most to lose here

• …but if they are successful, they bring with them a simple and attractive learning theory, and mechanisms that can readily be grounded at the neural level

• However, if the advocates of rules/variables succeed here or elsewhere, they face the more difficult challenge at the neuroscientific level

Questions about Lab 2b

• How did the network perform?

• How well did the network generalize to novel stems?

• What was the effect of the frequency manipulation?

• Does the network need to internalize a Blocking Principle?

• Does the network explicitly represent a default form?

1. Are regulars different?2. Do regulars implicate operations over variables?

Neuropsychological Dissociations

Other Domains of Morphology

Beyond Sound Similarity

Regulars and Associative Memory






(Pinker & Ullman 2002)


• Zero-derived denominals are regular

– Soldiers ringed the city

– *Soldiers rang the city

– high-sticked, grandstanded, …

– *high-stuck, *grandstood, …

• Productive in adults & children

• Shows sensitivity to morphological structure[[ stem N] ø V]-ed

• Provides good evidence that sound similarity is not everything

• But nothing prevents a model from using richer similarity metric– morphological structure (for ringed)

– semantic similarity (for low-lifes)






Regulars & Associative Memory

• Regulars are productive, need not be stored

• Irregulars are not productive, must be stored

• But are regulars immune to effects of associative memory?– frequency

– over-irregularization

• Pinker & Ullman:– regulars may be stored

– but they can also be generated on-the-fly

– ‘race’ can determine which of the two routes wins

– some tasks more likely to show effects of stored regulars

Child vs. Adult Impairments

• Specific Language Impairment– Early claims that regulars

show greater impairment than irregulars are not confirmed

• Pinker & Ullman 2002b– ‘The best explanation is that

language-impaired people are indeed impaired with rules, […] but can memorize common regular forms.’

– Regulars show consistent frequency effects in SLI, not in controls.

– ‘This suggests that children growing up with a grammatical deficit are better at compensating for it via memorization than are adults who acquired their deficit later in life.’







• Ullman et al. 1997– Alzheimer’s disease patients

• Poor memory retrieval

• Poor irregulars

• Good regulars

– Parkinson’s disease patients• Impaired motor control, good

memory

• Good irregulars

• Poor regulars

• Striking correlation involving laterality of effect

• Marslen-Wilson & Tyler 1997– Normals

• past tense primes stem

– 2 Broca’s Patients• irregulars prime stems

• inhibition for regulars

– 1 patient with bilateral lesion• regulars prime stems

• no priming for irregulars or semantic associates

Morphological Priming

• Lexical Decision Task– CAT, TAC, BIR, LGU, DOG

– press ‘Yes’ if this is a word

• Priming– facilitation in decision times

when related word precedes target (relative to unrelated control)

– e.g., {dog, rug} - cat

• Marslen-Wilson & Tyler 1997– Regular

{jumped, locked} - jump

– Irregular{found, shows} - find

– Semantic{swan, hay} - goose

– Sound{gravy, sherry} - grave


• Bird et al. 2003– complain that arguments for

selective difficulty with regulars are confounded with the phonological complexity of the word-endings

• Pinker & Ullman 2002– weight of evidence still

supports dissociation; Bird et al.’s materials contained additional confounds

Brain Imaging Studies

• Jaeger et al. 1996, Language– PET study of past tense

– Task: generate past from stem

– Design: blocked conditions

– Result: different areas of activation for regulars and irregulars

• Is this evidence decisive?– task demands very different

– difference could show up in network

– doesn’t implicate variables

• Münte et al. 1997– ERP study of violations

– Task: sentence reading

– Design: mixed

– Result:• regulars: ~LAN

• irregulars: ~N400

• Is this evidence decisive?– allows possibility of comparison

with other violations






Low-Frequency Defaults

• German Plurals– die Straße die Straßen

die Frau die Frauen

– der Apfel die Äpfeldie Mutter die Mütter

– das Auto die Autosder Park die Parks

die Schmidts

• -s plural low frequency, used for loan-words, denominals, names, etc.

• Response– frequency is not the critical

factor in a system that focuses on similarity

– distribution in the similarity space is crucial

– similarity space with islands of reliability

• network can learn islands

• or network can learn to associate a form with the space between the islands

Similarity Space

Similarity Space

Arabic Broken Plural

• CvCC– nafs nufuus ‘soul’– qidh qidaah ‘arrow’

• CvvCv(v)C– xaatam xawaatim‘signet ring’– jaamuus jawaamiis ‘buffalo’

• Sound Plural– shuway?ir shuway?ir-uun ‘poet (dim.)’– kaatib kaatib-uun ‘writing (participle)’– hind hind-aat ‘Hind (fem. name)’– ramadaan ramadaan-aat ‘Ramadan (month)’

German Plurals

(Hahn & Nakisa 2000)

Syntax, Semantics, & Statistics

Starting Small Simulation

• How well did the network perform?

• How did it manage to learn?

Generalization

• Training Items– Input: 1 0 1 0 Output: 1 0 1 0

– Input: 0 1 0 0 Output: 0 1 0 0

– Input: 1 1 1 0 Output: 1 1 1 0

– Input: 0 0 0 0 Output: 0 0 0 0

• Test Item– Input: 1 1 1 1 Output ? ? ? ?

Generalization


– Input: 0 1 0 0 Output: 0 1 0 0

– Input: 1 1 1 0 Output: 1 1 1 0

– Input: 0 0 0 0 Output: 0 0 0 0


1 1 1 1 (Humans)

1 1 1 0 (Network)

Generalization


– Input: 0 1 0 0 Output: 0 1 0 0

– Input: 1 1 1 0 Output: 1 1 1 0

– Input: 0 0 0 0 Output: 0 0 0 0


• Generalization fails because learning is local

1 1 1 1 (Humans)

1 1 1 0 (Network)

Generalization


– Input: 0 1 0 0 Output: 0 1 0 0

– Input: 1 1 1 0 Output: 1 1 1 0

– Input: 0 0 0 0 Output: 0 0 0 0


• Generalization succeeds because representations are shared

1 1 1 1 (Humans)

1 1 1 1 (Network)

Negative Evidence

• Standard Doctrine– Language learners do not receive negative evidence

– They must therefore learn only from positive examples

– This forces the learner to make constrained generalizations

– [even with negative evidence, generalizations must be constrained]

• Common Suggestion– ‘Implicit Negative Evidence’; the absence of certain examples in the input is taken to be

significant

– Who do you think John saw __? Who do you think that John saw __?Who do you think __ saw John? * Who do you think that __ saw John?

• Challenge: how to classify and store input appropriately, in order to detect appropriate generalizations; large memory needed

Feedback via Prediction

• Simple Recurrent Network– Prediction task provides feedback

– System does not need to explicitly store large amounts of data

– Challenge is to appropriately encode the feedback signal

– If learning rate is set low, learner is protected against noisy input data

Feedback via Prediction

• The prediction problem is very relevant– can be viewed as representing many kinds of syntactic relations

– typically, many predictions held simultaneously; if there’s an explicit representation of hierarchy and abstraction, then order of completion is easily predicted

– challenge is to avoid spurious competition among dependencies,

• Elman --> Rohde & Plaut– whatever happens, agreement is not the decisive case

• Seidenberg on locatives– locatives across languages are highly constrained; generalizations go beyond surface patterns;

the Seidenberg/Allen model is given the outline of the solution to the problem

– semantics, not statistics, is critical

• How to use SRN ideas effectively– Structured prediction device encodes hypotheses

– How to encode and act upon error signals?

– Partially matching features can lead to long-distance feedback

– Prediction of alternatives can lead to confirmation or disconfirmation of 1 choice

Infants and Statistical Learning

Saffran, Aslin, & Newport (1996)

• 8-month old infants– Passive exposure to continuous speech sequence for 2 minutes

bidakupadotigolabubidaku…

– Test (Experiment #2)bidakubidakubidakubidakubidaku…kupadokupadokupadokupadokupado…

– Infants listen longer to unfamiliar sequences

– Transitional Probabilities

bi da ku pa do ti

1.0 1.0 1.0 1.0

.33

Marcus et al. (1999)

• Training

– ABA: ga na ga li ti li

– ABB: ga na na li ti ti

• Testing

– ABA: wo fe wo

– ABB: wo fe fe

Aslin & Newport (in press)

• While some adjacent statistical regularities can be learned, other types of statistical regularities cannot

We have recently been developing a statistical approach to language acquisition and investigating the abilities of human adults, infants, and nonhuman primates to perform the computations that would be required for acquiring properties of natural languages by such a method. Our studies (initially in collaboration with Jenny Saffran) have shown that human adults and infants are capable of performing many of these computations online and with remarkable speed, during the presentation of controlled speech streams in the laboratory. We have also found that adults and infants can perform similar computations on nonlinguistic materials (e.g., music), and (in collaboration with Marc Hauser) that nonhuman primates can perform the simplest of these computations. However in our recent studies, when tested on more complex computations involving non-adjacent sounds, humans show strong selectivities (they can perform certain computations, but fail at others), corresponding to the patterns which natural languages do and do not exhibit. Primates are not capable of performing some of these more difficult computations. Additional recent studies examine how statistics can be used to form non-statistical generalizations and to regularize irregular structures, and how the computations we have hypothesized for word segmentation extend to acquiring syntactic phrase structure. Overall we feel that this approach may provide an important mechanism for learning certain aspects of language, particularly when combined with an understanding of the ways in which input statistics may be selectively extracted or altered as they are acquired. In addition, the constraints of learners in performing differing types and complexities of computations may provide part of the explanation for which learners can acquire human languages, and why languages have some of the properties they have.

Newport & Aslin, Dec. 2003)

A Little Syntax

Gold (1967)

• Hypothetical classes of languages– #1: {A}, {A, AA}, {A, AA, AAA}, {A, AA, AAA, AAAA}

– #2: {A}, {A, AA}, {A, AA, AAA}, {A, AA, AAA, AAAA, …, A∞}

• How could a learner figure out the target language, based on positive only examples (‘text presentation’)– #1:

– #2:

• Under class #2, there’s no way to guarantee convergence

Baker (1979)

• Alternating Verbs– John gave a cookie to the boy.

John gave the boy a cookie.– Mary showed some photos to her family.

Mary showed her family some photos.

• Non-Alternating Verbs– John donated a painting to the museum.

*John donated the museum a painting.– Mary displayed her art collection to the visitors.

*Mary displayed the visitors her art collection

• Learnability problem: how to avoid overgeneralization

Seidenberg (1997, Science)

• Locative Verb Constructions– John poured the water into the cup

*John poured the cup with water– *Sue filled the water into the glass

Sue filled the glass with water– Bill loaded the apples onto the truck

Bill loaded the truck with apples

• “Connectionist networks are well suited to capturing systems with this character. Importantly, a network configured as a device that learns to perform a task such as mapping from sound to meaning will act as a discovery procedure, determining which kinds of information are relevant. Evidence that such models can encode precisely the right combinations of probabilistic constraints is provided by Allen (42), who implemented a network that learns about verbs and their argument structures from naturalistic input.” (p. 1602)

Seidenberg (Science, 3/14/97)

• “Research on language has arrived at a particularly interesting point, however, because of important developments outside of the linguistic mainstream that are converging on a different view of the nature of language. These developments represent an important turn of events in the history of ideas about language.” (p. 1599)

Seidenberg (Science, 3/14/97)

• “A second implication concerns the relevance of poverty-of-the-stimulus arguments to other aspects of language. Verbs and their argument structures are important, but they are language specific rather than universal properties of languages and so must be learned from experience.” (p. 1602)

Allen’s Model

• Learns associations between(i) specific verbs & argument structures and(ii) semantic representations

• Feature encoding for verbs, 360 features[eat]: +act, +cause, +consume, etc.[John]: +human, +animate, +male, +automotive, -vehicle

Allen’s Model

• Learns associations between(i) specific verbs & argument structures and(ii) semantic representations

• Training set: 1200 ‘utterance types’ taken from caretaker speech in Peter corpus (CHILDES)

Allen’s Model

• Fine-grained distinction between hit, carryJohn kicked Mary the ball*John carried Mary the basket

• [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit

• [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Allen’s Model




Allen’s Model




Allen’s Model




Allen’s Model


• “This behavior shows crucially that the network is not merely sensitive to overall semantic similarity: rather, the network has organized the semantic space such that some features are more important than other.” (p. 5)

Challenges

• Allen’s results are impressive; the model is interesting in the way that it poses the learning task as a selection process (the linking rules do not emerge from nowhere)

• Fine-grained distinctions in English• ‘Concealed’ distinctions in Korean• Reason for universals

Challenges

• Fine-grained distinctions, e.g. in English

• pour the water into the glasspour the waterthe poured water

• stand the lamp on the floor*stand the lamp*the stood lamp

Challenges

• ‘Concealed’ distinctions, e.g. in Korean

• pour the water into the glass*pour the glass with water

• pile the books onto the shelf*pile the shelf with books

• *pour-put the glass with waterpile-put the shelf with books

Challenges

• Universals, parametric connections - why should they exist and be stable?

(Pena et al. 2002)

Date post:	08-Jan-2016
Category:	Documents
Upload:	kendis
View:	43 times
Download:	1 times

InputOutput 0 00 1 00 0 10 1 11

Documents