+ All Categories
Home > Documents > InputOutput 0 00 1 00 0 10 1 11

InputOutput 0 00 1 00 0 10 1 11

Date post: 08-Jan-2016
Category:
Upload: kendis
View: 43 times
Download: 1 times
Share this document with a friend
Description:
AND Network. InputOutput 0 00 1 00 0 10 1 11. OR Network. InputOutput 0 00 1 01 0 11 1 11. NETWORK CONFIGURED BY TLEARN # weights after 10000 sweeps # WEIGHTS # TO NODE 1 -1.9083807468## bias to 1 4.3717832565## i1 to 1 4.3582129478## i2 to 1 0.0000000000. - PowerPoint PPT Presentation
78
Transcript
Page 1: InputOutput 0 00 1 00 0 10 1 11
Page 2: InputOutput 0 00 1 00 0 10 1 11

Input Output

0 0 01 0 00 1 01 1 1

AND Network

Page 3: InputOutput 0 00 1 00 0 10 1 11

NETWORK CONFIGURED BY TLEARN# weights after 10000 sweeps# WEIGHTS# TO NODE 1-1.9083807468 ## bias to 14.3717832565 ## i1 to 14.3582129478 ## i2 to 10.0000000000

OR Network

Input Output

0 0 01 0 10 1 11 1 1

Page 4: InputOutput 0 00 1 00 0 10 1 11

XOR Network

Page 5: InputOutput 0 00 1 00 0 10 1 11

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

XOR Network

Page 6: InputOutput 0 00 1 00 0 10 1 11

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Page 7: InputOutput 0 00 1 00 0 10 1 11

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Page 8: InputOutput 0 00 1 00 0 10 1 11

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Input Output

0 0 01 0 00 1 11 1 0

Page 9: InputOutput 0 00 1 00 0 10 1 11

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2

-4.4429202080 ## bias to output9.0652370453 ## 1 to output8.9045801163 ## 2 to output

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Input Output

0 0 01 0 00 1 11 1 0

Page 10: InputOutput 0 00 1 00 0 10 1 11

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2

-4.4429202080 ## bias to output9.0652370453 ## 1 to output8.9045801163 ## 2 to output

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Input Output

0 0 01 0 00 1 11 1 0

Input Output

0 0 01 0 10 1 11 1 1

Page 11: InputOutput 0 00 1 00 0 10 1 11

-3.0456776619 ## bias to 15.5165352821 ## i1 to 1-5.7562727928 ## i2 to 1

-3.6789164543 ## bias to 2-6.4448370934 ## i1 to 26.4957633018 ## i2 to 2

-4.4429202080 ## bias to output9.0652370453 ## 1 to output8.9045801163 ## 2 to output

XOR Network

Input Output

0 0 01 0 10 1 01 1 0

Input Output

0 0 01 0 00 1 11 1 0

Input Output

0 0 01 0 10 1 11 1 1

The mapping from the hidden units to output is an OR network, that never receives a [1 1] input.

Page 12: InputOutput 0 00 1 00 0 10 1 11
Page 13: InputOutput 0 00 1 00 0 10 1 11
Page 14: InputOutput 0 00 1 00 0 10 1 11

The Past Tense and Beyond

Page 15: InputOutput 0 00 1 00 0 10 1 11

Classic Developmental Story

• Initial mastery of regular and irregular past tense forms

• Overregularization appears only later (e.g. goed, comed)

• ‘U-Shaped’ developmental pattern taken as evidence for learning of a morphological rule

V + [+past] --> stem + /d/

Page 16: InputOutput 0 00 1 00 0 10 1 11

Rumelhart & McClelland 1986

Model learns to classify regulars and irregulars,based on sound similarity alone.Shows U-shaped developmental profile.

Page 17: InputOutput 0 00 1 00 0 10 1 11

What is really at stake here?

• Abstraction

• Operations over variables

• Learning based on input

Page 18: InputOutput 0 00 1 00 0 10 1 11

What is not at stake here

• Feedback, negative evidence, etc.

Page 19: InputOutput 0 00 1 00 0 10 1 11

Who has the most at stake here?

• Those who deny the need for rules/variables in language have the most to lose here

• …but if they are successful, they bring with them a simple and attractive learning theory, and mechanisms that can readily be grounded at the neural level

• However, if the advocates of rules/variables succeed here or elsewhere, they face the more difficult challenge at the neuroscientific level

Page 20: InputOutput 0 00 1 00 0 10 1 11

Questions about Lab 2b

• How did the network perform?

• How well did the network generalize to novel stems?

• What was the effect of the frequency manipulation?

• Does the network need to internalize a Blocking Principle?

• Does the network explicitly represent a default form?

Page 21: InputOutput 0 00 1 00 0 10 1 11

1. Are regulars different?2. Do regulars implicate operations over variables?

Neuropsychological Dissociations

Other Domains of Morphology

Beyond Sound Similarity

Regulars and Associative Memory

Page 22: InputOutput 0 00 1 00 0 10 1 11

1. Are regulars different?2. Do regulars implicate operations over variables?

Neuropsychological Dissociations

Other Domains of Morphology

Beyond Sound Similarity

Regulars and Associative Memory

Page 23: InputOutput 0 00 1 00 0 10 1 11

(Pinker & Ullman 2002)

Page 24: InputOutput 0 00 1 00 0 10 1 11

Beyond Sound Similarity

• Zero-derived denominals are regular

– Soldiers ringed the city

– *Soldiers rang the city

– high-sticked, grandstanded, …

– *high-stuck, *grandstood, …

• Productive in adults & children

• Shows sensitivity to morphological structure[[ stem N] ø V]-ed

• Provides good evidence that sound similarity is not everything

• But nothing prevents a model from using richer similarity metric– morphological structure (for ringed)

– semantic similarity (for low-lifes)

Page 25: InputOutput 0 00 1 00 0 10 1 11

1. Are regulars different?2. Do regulars implicate operations over variables?

Neuropsychological Dissociations

Other Domains of Morphology

Beyond Sound Similarity

Regulars and Associative Memory

Page 26: InputOutput 0 00 1 00 0 10 1 11

Regulars & Associative Memory

• Regulars are productive, need not be stored

• Irregulars are not productive, must be stored

• But are regulars immune to effects of associative memory?– frequency

– over-irregularization

• Pinker & Ullman:– regulars may be stored

– but they can also be generated on-the-fly

– ‘race’ can determine which of the two routes wins

– some tasks more likely to show effects of stored regulars

Page 27: InputOutput 0 00 1 00 0 10 1 11

Child vs. Adult Impairments

• Specific Language Impairment– Early claims that regulars

show greater impairment than irregulars are not confirmed

• Pinker & Ullman 2002b– ‘The best explanation is that

language-impaired people are indeed impaired with rules, […] but can memorize common regular forms.’

– Regulars show consistent frequency effects in SLI, not in controls.

– ‘This suggests that children growing up with a grammatical deficit are better at compensating for it via memorization than are adults who acquired their deficit later in life.’

Page 28: InputOutput 0 00 1 00 0 10 1 11

1. Are regulars different?2. Do regulars implicate operations over variables?

Neuropsychological Dissociations

Other Domains of Morphology

Beyond Sound Similarity

Regulars and Associative Memory

Page 29: InputOutput 0 00 1 00 0 10 1 11

Neuropsychological Dissociations

• Ullman et al. 1997– Alzheimer’s disease patients

• Poor memory retrieval

• Poor irregulars

• Good regulars

– Parkinson’s disease patients• Impaired motor control, good

memory

• Good irregulars

• Poor regulars

• Striking correlation involving laterality of effect

• Marslen-Wilson & Tyler 1997– Normals

• past tense primes stem

– 2 Broca’s Patients• irregulars prime stems

• inhibition for regulars

– 1 patient with bilateral lesion• regulars prime stems

• no priming for irregulars or semantic associates

Page 30: InputOutput 0 00 1 00 0 10 1 11

Morphological Priming

• Lexical Decision Task– CAT, TAC, BIR, LGU, DOG

– press ‘Yes’ if this is a word

• Priming– facilitation in decision times

when related word precedes target (relative to unrelated control)

– e.g., {dog, rug} - cat

• Marslen-Wilson & Tyler 1997– Regular

{jumped, locked} - jump

– Irregular{found, shows} - find

– Semantic{swan, hay} - goose

– Sound{gravy, sherry} - grave

Page 31: InputOutput 0 00 1 00 0 10 1 11
Page 32: InputOutput 0 00 1 00 0 10 1 11

Neuropsychological Dissociations

• Bird et al. 2003– complain that arguments for

selective difficulty with regulars are confounded with the phonological complexity of the word-endings

• Pinker & Ullman 2002– weight of evidence still

supports dissociation; Bird et al.’s materials contained additional confounds

Page 33: InputOutput 0 00 1 00 0 10 1 11

Brain Imaging Studies

• Jaeger et al. 1996, Language– PET study of past tense

– Task: generate past from stem

– Design: blocked conditions

– Result: different areas of activation for regulars and irregulars

• Is this evidence decisive?– task demands very different

– difference could show up in network

– doesn’t implicate variables

• Münte et al. 1997– ERP study of violations

– Task: sentence reading

– Design: mixed

– Result:• regulars: ~LAN

• irregulars: ~N400

• Is this evidence decisive?– allows possibility of comparison

with other violations

Page 34: InputOutput 0 00 1 00 0 10 1 11

1. Are regulars different?2. Do regulars implicate operations over variables?

Neuropsychological Dissociations

Other Domains of Morphology

Beyond Sound Similarity

Regulars and Associative Memory

Page 35: InputOutput 0 00 1 00 0 10 1 11

Low-Frequency Defaults

• German Plurals– die Straße die Straßen

die Frau die Frauen

– der Apfel die Äpfeldie Mutter die Mütter

– das Auto die Autosder Park die Parks

die Schmidts

• -s plural low frequency, used for loan-words, denominals, names, etc.

• Response– frequency is not the critical

factor in a system that focuses on similarity

– distribution in the similarity space is crucial

– similarity space with islands of reliability

• network can learn islands

• or network can learn to associate a form with the space between the islands

Page 36: InputOutput 0 00 1 00 0 10 1 11

Similarity Space

Page 37: InputOutput 0 00 1 00 0 10 1 11

Similarity Space

Page 38: InputOutput 0 00 1 00 0 10 1 11

Arabic Broken Plural

• CvCC– nafs nufuus ‘soul’– qidh qidaah ‘arrow’

• CvvCv(v)C– xaatam xawaatim‘signet ring’– jaamuus jawaamiis ‘buffalo’

• Sound Plural– shuway?ir shuway?ir-uun ‘poet (dim.)’– kaatib kaatib-uun ‘writing (participle)’– hind hind-aat ‘Hind (fem. name)’– ramadaan ramadaan-aat ‘Ramadan (month)’

Page 39: InputOutput 0 00 1 00 0 10 1 11

German Plurals

(Hahn & Nakisa 2000)

Page 40: InputOutput 0 00 1 00 0 10 1 11
Page 41: InputOutput 0 00 1 00 0 10 1 11

Syntax, Semantics, & Statistics

Page 42: InputOutput 0 00 1 00 0 10 1 11
Page 43: InputOutput 0 00 1 00 0 10 1 11
Page 44: InputOutput 0 00 1 00 0 10 1 11
Page 45: InputOutput 0 00 1 00 0 10 1 11

Starting Small Simulation

• How well did the network perform?

• How did it manage to learn?

Page 46: InputOutput 0 00 1 00 0 10 1 11
Page 47: InputOutput 0 00 1 00 0 10 1 11

Generalization

• Training Items– Input: 1 0 1 0 Output: 1 0 1 0

– Input: 0 1 0 0 Output: 0 1 0 0

– Input: 1 1 1 0 Output: 1 1 1 0

– Input: 0 0 0 0 Output: 0 0 0 0

• Test Item– Input: 1 1 1 1 Output ? ? ? ?

Page 48: InputOutput 0 00 1 00 0 10 1 11

Generalization

• Training Items– Input: 1 0 1 0 Output: 1 0 1 0

– Input: 0 1 0 0 Output: 0 1 0 0

– Input: 1 1 1 0 Output: 1 1 1 0

– Input: 0 0 0 0 Output: 0 0 0 0

• Test Item– Input: 1 1 1 1 Output ? ? ? ?

1 1 1 1 (Humans)

1 1 1 0 (Network)

Page 49: InputOutput 0 00 1 00 0 10 1 11

Generalization

• Training Items– Input: 1 0 1 0 Output: 1 0 1 0

– Input: 0 1 0 0 Output: 0 1 0 0

– Input: 1 1 1 0 Output: 1 1 1 0

– Input: 0 0 0 0 Output: 0 0 0 0

• Test Item– Input: 1 1 1 1 Output ? ? ? ?

• Generalization fails because learning is local

1 1 1 1 (Humans)

1 1 1 0 (Network)

Page 50: InputOutput 0 00 1 00 0 10 1 11

Generalization

• Training Items– Input: 1 0 1 0 Output: 1 0 1 0

– Input: 0 1 0 0 Output: 0 1 0 0

– Input: 1 1 1 0 Output: 1 1 1 0

– Input: 0 0 0 0 Output: 0 0 0 0

• Test Item– Input: 1 1 1 1 Output ? ? ? ?

• Generalization succeeds because representations are shared

1 1 1 1 (Humans)

1 1 1 1 (Network)

Page 51: InputOutput 0 00 1 00 0 10 1 11

Negative Evidence

• Standard Doctrine– Language learners do not receive negative evidence

– They must therefore learn only from positive examples

– This forces the learner to make constrained generalizations

– [even with negative evidence, generalizations must be constrained]

• Common Suggestion– ‘Implicit Negative Evidence’; the absence of certain examples in the input is taken to be

significant

– Who do you think John saw __? Who do you think that John saw __?Who do you think __ saw John? * Who do you think that __ saw John?

• Challenge: how to classify and store input appropriately, in order to detect appropriate generalizations; large memory needed

Page 52: InputOutput 0 00 1 00 0 10 1 11

Feedback via Prediction

• Simple Recurrent Network– Prediction task provides feedback

– System does not need to explicitly store large amounts of data

– Challenge is to appropriately encode the feedback signal

– If learning rate is set low, learner is protected against noisy input data

Page 53: InputOutput 0 00 1 00 0 10 1 11

Feedback via Prediction

• The prediction problem is very relevant– can be viewed as representing many kinds of syntactic relations

– typically, many predictions held simultaneously; if there’s an explicit representation of hierarchy and abstraction, then order of completion is easily predicted

– challenge is to avoid spurious competition among dependencies,

Page 54: InputOutput 0 00 1 00 0 10 1 11
Page 55: InputOutput 0 00 1 00 0 10 1 11

• Elman --> Rohde & Plaut– whatever happens, agreement is not the decisive case

• Seidenberg on locatives– locatives across languages are highly constrained; generalizations go beyond surface patterns;

the Seidenberg/Allen model is given the outline of the solution to the problem

– semantics, not statistics, is critical

• How to use SRN ideas effectively– Structured prediction device encodes hypotheses

– How to encode and act upon error signals?

– Partially matching features can lead to long-distance feedback

– Prediction of alternatives can lead to confirmation or disconfirmation of 1 choice

Page 56: InputOutput 0 00 1 00 0 10 1 11

Infants and Statistical Learning

Page 57: InputOutput 0 00 1 00 0 10 1 11

Saffran, Aslin, & Newport (1996)

• 8-month old infants– Passive exposure to continuous speech sequence for 2 minutes

bidakupadotigolabubidaku…

– Test (Experiment #2)bidakubidakubidakubidakubidaku…kupadokupadokupadokupadokupado…

– Infants listen longer to unfamiliar sequences

– Transitional Probabilities

bi da ku pa do ti

1.0 1.0 1.0 1.0

.33

Page 58: InputOutput 0 00 1 00 0 10 1 11

Marcus et al. (1999)

• Training

– ABA: ga na ga li ti li

– ABB: ga na na li ti ti

• Testing

– ABA: wo fe wo

– ABB: wo fe fe

Page 59: InputOutput 0 00 1 00 0 10 1 11

Aslin & Newport (in press)

• While some adjacent statistical regularities can be learned, other types of statistical regularities cannot

Page 60: InputOutput 0 00 1 00 0 10 1 11

We have recently been developing a statistical approach to language acquisition and investigating the abilities of human adults, infants, and nonhuman primates to perform the computations that would be required for acquiring properties of natural languages by such a method. Our studies (initially in collaboration with Jenny Saffran) have shown that human adults and infants are capable of performing many of these computations online and with remarkable speed, during the presentation of controlled speech streams in the laboratory. We have also found that adults and infants can perform similar computations on nonlinguistic materials (e.g., music), and (in collaboration with Marc Hauser) that nonhuman primates can perform the simplest of these computations. However in our recent studies, when tested on more complex computations involving non-adjacent sounds, humans show strong selectivities (they can perform certain computations, but fail at others), corresponding to the patterns which natural languages do and do not exhibit. Primates are not capable of performing some of these more difficult computations. Additional recent studies examine how statistics can be used to form non-statistical generalizations and to regularize irregular structures, and how the computations we have hypothesized for word segmentation extend to acquiring syntactic phrase structure. Overall we feel that this approach may provide an important mechanism for learning certain aspects of language, particularly when combined with an understanding of the ways in which input statistics may be selectively extracted or altered as they are acquired. In addition, the constraints of learners in performing differing types and complexities of computations may provide part of the explanation for which learners can acquire human languages, and why languages have some of the properties they have.

Newport & Aslin, Dec. 2003)

Page 61: InputOutput 0 00 1 00 0 10 1 11

A Little Syntax

Page 62: InputOutput 0 00 1 00 0 10 1 11

Gold (1967)

• Hypothetical classes of languages– #1: {A}, {A, AA}, {A, AA, AAA}, {A, AA, AAA, AAAA}

– #2: {A}, {A, AA}, {A, AA, AAA}, {A, AA, AAA, AAAA, …, A∞}

• How could a learner figure out the target language, based on positive only examples (‘text presentation’)– #1:

– #2:

• Under class #2, there’s no way to guarantee convergence

Page 63: InputOutput 0 00 1 00 0 10 1 11

Baker (1979)

• Alternating Verbs– John gave a cookie to the boy.

John gave the boy a cookie.– Mary showed some photos to her family.

Mary showed her family some photos.

• Non-Alternating Verbs– John donated a painting to the museum.

*John donated the museum a painting.– Mary displayed her art collection to the visitors.

*Mary displayed the visitors her art collection

• Learnability problem: how to avoid overgeneralization

Page 64: InputOutput 0 00 1 00 0 10 1 11

Seidenberg (1997, Science)

• Locative Verb Constructions– John poured the water into the cup

*John poured the cup with water– *Sue filled the water into the glass

Sue filled the glass with water– Bill loaded the apples onto the truck

Bill loaded the truck with apples

• “Connectionist networks are well suited to capturing systems with this character. Importantly, a network configured as a device that learns to perform a task such as mapping from sound to meaning will act as a discovery procedure, determining which kinds of information are relevant. Evidence that such models can encode precisely the right combinations of probabilistic constraints is provided by Allen (42), who implemented a network that learns about verbs and their argument structures from naturalistic input.” (p. 1602)

Page 65: InputOutput 0 00 1 00 0 10 1 11

Seidenberg (Science, 3/14/97)

• “Research on language has arrived at a particularly interesting point, however, because of important developments outside of the linguistic mainstream that are converging on a different view of the nature of language. These developments represent an important turn of events in the history of ideas about language.” (p. 1599)

Page 66: InputOutput 0 00 1 00 0 10 1 11

Seidenberg (Science, 3/14/97)

• “A second implication concerns the relevance of poverty-of-the-stimulus arguments to other aspects of language. Verbs and their argument structures are important, but they are language specific rather than universal properties of languages and so must be learned from experience.” (p. 1602)

Page 67: InputOutput 0 00 1 00 0 10 1 11

Allen’s Model

• Learns associations between(i) specific verbs & argument structures and(ii) semantic representations

• Feature encoding for verbs, 360 features[eat]: +act, +cause, +consume, etc.[John]: +human, +animate, +male, +automotive, -vehicle

Page 68: InputOutput 0 00 1 00 0 10 1 11

Allen’s Model

• Learns associations between(i) specific verbs & argument structures and(ii) semantic representations

• Training set: 1200 ‘utterance types’ taken from caretaker speech in Peter corpus (CHILDES)

Page 69: InputOutput 0 00 1 00 0 10 1 11

Allen’s Model

• Fine-grained distinction between hit, carryJohn kicked Mary the ball*John carried Mary the basket

• [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit

• [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Page 70: InputOutput 0 00 1 00 0 10 1 11

Allen’s Model

• Fine-grained distinction between hit, carryJohn kicked Mary the ball*John carried Mary the basket

• [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit

• [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Page 71: InputOutput 0 00 1 00 0 10 1 11

Allen’s Model

• Fine-grained distinction between hit, carryJohn kicked Mary the ball*John carried Mary the basket

• [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit

• [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Page 72: InputOutput 0 00 1 00 0 10 1 11

Allen’s Model

• Fine-grained distinction between hit, carryJohn kicked Mary the ball*John carried Mary the basket

• [kick]: +cause, +apply-force, +move, +travel, +contact, +hit-with-foot, +strike, +kick, +instantaneous-force, +hit

• [carry]: +cause, + apply-force, +move, +travel, +contact, +carry, +support, +continuous-force, +accompany

Page 73: InputOutput 0 00 1 00 0 10 1 11

Allen’s Model

• Fine-grained distinction between hit, carryJohn kicked Mary the ball*John carried Mary the basket

• “This behavior shows crucially that the network is not merely sensitive to overall semantic similarity: rather, the network has organized the semantic space such that some features are more important than other.” (p. 5)

Page 74: InputOutput 0 00 1 00 0 10 1 11

Challenges

• Allen’s results are impressive; the model is interesting in the way that it poses the learning task as a selection process (the linking rules do not emerge from nowhere)

• Fine-grained distinctions in English• ‘Concealed’ distinctions in Korean• Reason for universals

Page 75: InputOutput 0 00 1 00 0 10 1 11

Challenges

• Fine-grained distinctions, e.g. in English

• pour the water into the glasspour the waterthe poured water

• stand the lamp on the floor*stand the lamp*the stood lamp

Page 76: InputOutput 0 00 1 00 0 10 1 11

Challenges

• ‘Concealed’ distinctions, e.g. in Korean

• pour the water into the glass*pour the glass with water

• pile the books onto the shelf*pile the shelf with books

• *pour-put the glass with waterpile-put the shelf with books

Page 77: InputOutput 0 00 1 00 0 10 1 11

Challenges

• Universals, parametric connections - why should they exist and be stable?

Page 78: InputOutput 0 00 1 00 0 10 1 11

(Pena et al. 2002)


Recommended