Connectionist modeling of sentence comprehension as mental ...kvasnicka/Seminar_of_AI/Farkas... ·...

transcript

Connectionist modeling of sentence comprehension as mental simulation

in simple microworld

Igor FarkašDepartment of applied informatics

Comenius UniversityBratislava

AI seminar, October 2009, FIIT STU Bratislava

How does human cognition work?

perception

action

cognitionenvironment

● What is cognition?● Where and how is knowledge represented?

Symbolic knowledge representation

● properties

– symbols, transduced from perceptual inputs

– (conceptual) symbols are amodal (new repr. language)

– mental narratives using “inner speech” or words

– cognition separated from perception

● Virtues (of this expressive powerful type of KR)

– Productivity, type-token distinction, categorical inferences, accounts for abstractness, compositionality, propositions

● Problems

– lacking empirical evidence, symbol grounding problem explanation of transduction, integration with other sciences

– neither parsimonious, nor falsifiable (post-hoc accounts)

Embodied knowledge representation

● properties

– Symbols are perceptual, derived from perceptual inputs

– (conceptual) symbols are modal

– mental narratives using are modality-tied (e.g. perceptual simulations)

– cognition overlaps with perception

● Virtues

– accumulated empirical evidence, symbol grounding solved, accounts for abstractness, makes predictions for exper.

● difficulties

– abstractness, type-token distinctions, categorical inferences

Amodal vs Perceptual Symbol System

(Barsalou, 1999)

Meaning - a key concept for cognition

● What is meaning?

– content carried by signs during communication with environment

● realist semantics● Extensional - meanings as objects in the world (Frege, Tarski)

● Intensional - meanings as mappings to possible worlds (Kripke)

● cognitive semantics● meanings as mental entities (Barsalou, Lakoff, Rosch,...)

● Meanings go beyond language

– linguistic view too restricted

– cf. functionalist semantics (Wittgenstein,...), speech acts

Meanings in language comprehension

● Are propositions necessary?

– Barsalou: yes, belief: can be realized by (mental) simulators

● Mental simulation as alternative theory

● empirical evidence, e.g. Stanfield & Zwaan 2001

– “John put the pencil in the cup / drawer”

– How to get from in(pencil, cup) to orientation(pencil, vertical)?

● theory of text understanding:

3 levels of representation (Kintsch & van Dijk, 1978)

● surface level – e.g. Pencil is in cup. There is a pencil in the cup.● propositional level - e.g. in(pencil, cup)● situational level – goes beyond language

Sentence comprehension in neural nets

● typically off-line training mode (no autonomy)

● distributed representations involved

● earlier NN models – use propositional representations (usually prepared before-hand)

– e.g. Hadley, Desay, Dominey, St. John & McClelland, Miikkulainen, Mayberry et al, …

● our approach – based on (distributed) situational representations

– motivated by Frank et al's (2003-) work

InSOMnet

(Mayberry & Miikkulainen, 2003, in press)

Minimum Recursion Semantics Framework

Situation space of a microworld

● situational space is built from example situations, exploiting their statistical properties (constraints), in self-organized way

● representations are analogical (cf. Barsalou's perceptual symbols) and non-compositional

● microworld of Frank et al (2003-)

– 3 persons, engaged in various activities at various places, jointly or independently

– Situation ~ consists of basic events

– operates on 'higher' level: using amodal reps

● Our (object) microworld

– max. 2 objects is a scene, various positions, identity and color.

– Situation ~ consists of object properties (rather than events)

– hence, representations are modal

Microworld properties / constraints

● small 2D grid microworld (3x3)

● max. 2 objects (blocks, pyramids) simultaneously present in a situation, two colours (red, blue)

● Microworld constraints:

– all objects are subject to gravity

– only one object at a time can be help in the air (by an arm)

– pyramid is an unstable object (cannot support another object)

=> objects are more likely to be on the ground

Building a situational space

● train a self-organizing map (SOM) with possible example situations

● Situations presented to SOM in the form of binary proposition vectors – specifying object position & features (two visual streams)

– [x1 y1 x2 y2 id1 id2 clr1 clr2]

“Where” | “what” e.g.

[0110 1100 0011 0011 | 01 10 00 11]

Situational representations = (non-propositional) distributed output activations of SOM

Position encoding:right = 0011 = upmiddle = 0110 = middleleft = 1100 = bottom

24-dim

i (p),...

unit i

Property encoding:Block = [10], pyramid = [01]Red = [10], blue = [01]

(Kohonen, 1995)

Propositions – occurrence of properties

● Microworld is described by example situations (non-linguistic description)

● Each situation j: proposition vector = a boolean combination of 24 basic properties: b

j = [b

j(p),b

j(q),...]

– bj(p) indicates whether basic property p occurs in situation j

– there exist dependencies b/w components (properties)

● Rules of fuzzy logic applicable for combination of properties

bj(¬p) = 1 – b

bj(p∧q) = b

j(p).b

j(q) we used instead: min{b

j(p),b

bj(p∨q) = b

j(p) +b

j(q) - b

j(p∧q)

Probabilities and beliefs about properties

A priori probability about occurrence of property p in microworld

Prob p =1/ k∑ j=1

kb j p

n = 12x12

ituati

where what

SOM accurately approximates probabilities in microworld by beliefs in DSS:

(CorrCoef ≈ 0.98)

Microworld: pDSS

probabilities beliefs

dim. reduction (k to n)

p=1/ n∑ j=1

SOM representations of basic properties

Extracting beliefs from SOM output

P p∣X =P p∧X

p∣X =

min {i p , x i }

SOM: neurons i = 1,2,...,n

For each proposition and each neuron:

membership value: i (p) = extent to which

neuron i contributes to representing property p

The whole map: (p) =[1 (p),

2 (p),...,

n (p)]

Belief in p in situation X

Assume generated situation vector (SOM output) X = [x1x

2 ... x

Conditional probability:

Modeling text comprehension

● microlanguage with 13 words:

red, blue, block, pyramid,left, right, on-top, up, in-middle, bottom, above, just, '.'

● Length: 4-5 (1 obj), 7-8 (2 obj)

● Word encoding: localist

● Standard Elman network, with 13-h-144 units

● trained via error back-propagation learning algorithm

● (in general) a rather complex mapping: simplified scheme used (1 sentence ~ 1 situation) Input sequence: red block in-middle

blue pyramid up right .

Rules for sentence generation

● Object 1 - always specified with absolute position

● If object 2 shares one coordinate with object 1,

then object 2 is given relative position

– e.g. “red block in-middle red pyramid above .”

otherwise absolute position

– e.g. “red block in-middle red pyramid up right .”

● If object lying at bottom alone, posY not specified by any word.

● For relative pos: just left (distance 1) or left (distance 2)

● In-middle – ambiguous (applies to both coordinates)

Simulation setup

● hidden layer size manipulated (60-120 units)

● logistic units at hidden and output layers

● all network weights randomly initialized (-.1,+1)

● constant learning rate = 0.05

● weights updated at each step

● target (DSS vector) fixed during sentence presentation

● average results reported (over 3 random splits)

● training set: 200 sentences, test set: 75 sentences

● training: 4000 epochs

Sentence comprehension score

Evolution of comprehension score during sentence processing (110 hidden units)(evaluated at the end of sentences)

p∣S − p1− p

if p∣S p

p∣S − p p

otherwise

Comprehensionscore =

Merging syntax with semantics

● NN forced to simultaneously learn to predict next words (in addition to situational representation)

● internal representations shared

Next word

delayPrediction measure used:Normalized negative log-likelihood:

NNL ∝ -<log(p(wnext

|ctx)>

= probs of the next wordCurrent word

(outputs first converted to probs)

Prediction results

Model 1 Model 2

# hidden units [compreh score] Trn / tst

[compreh score] Trn / tst

[NNL] Trn / tst

90 .61 / .42 .61 / .44 .34 / .42

100 .62 / .47 .67 / .40 .30 / .41

110 .64 / .43 .64 / .44 .31 / .37

The lower NNL, the better prediction

Model 1: w/out next-word predictionModel 2: with next-word prediction

Breaking down comprehension score

Most difficult testing predictions

Lowest compreh. score (<.1):

● Situations with two objects, at least one not at bottom. ● Situations that were more different from all training sentences (by 2 properties)

2 degrees of generalization (underlying systematicity)

Summary

● We presented a connectionist approach to (simple) sentence comprehension based on (mental simulation of) distributed situational representations of the block microworld.

● Situational representations are grounded in vision (what+where info), constructed online from example situations.

● Sentence understanding was evaluated by comprehension score which was in all cases positive.

● The model can learn both semantics and syntax at the same time.

● Questions: Scaling up (non-propositional reps)? How about abstract concepts?

Ďakujem za pozornosť.

Connectionist modeling of sentence comprehension as mental ...kvasnicka/Seminar_of_AI/Farkas... ·...

Documents