ATILA – 2013yevgen/news/ATiLA13_matusevych.pdf · 2017. 1. 5. · ATILA – 2013 . An algorithm...

Post on 23-Jan-2021

3 views 0 download

transcript

ATILA – 2013

An algorithm for generating child–adult interaction data

Yevgen Matusevych Afra Alishahi

Contents

1. Input to CLA models.

2. Natural vs. generated input.

3. Hybrid approach.

4. Improving the algorithm.

Overview

• Computational models of child language acquisition (CLA) often use as an input utterance–scene pairs, for example in modeling cross-situational word learning:

Utterance (linguistic input): Take the ball!

Scene (visual input): {ball, car, rattle, book}

• Existing collections of child-directed speech (e.g., CHILDES) provide the linguistic input, but not the visual input.

Input to CLA models

Two possibilities:

1. Use a small manually annotated dataset.

- Relatively small amounts.

2. Generate visual input automatically.

- But what about its statistical properties?

Input to a cognitively plausible model must have the same statistical properties as the naturalistic data. So we need to compare the two sources.

Manually annotated sample • 3 short fragments (~10 min. each) of video recordings of 13-month-old

children playing toys with adults.

• Adult’s and child’s gaze directions, utterances and actions.

• Scene at step 3: [adult, child, book, car, open, point, play]

# Who? Looks where? Does what? Says what?

1. Adult child point book FROG. CROAK-CROAK

2. Child car play car [babbling]

3. Adult book open book CROAK-CROAK

Automatically prepared data Fazly, Alishahi et al., 2010:

use semantic symbols that correspond to the words in the utterance. Referential uncertainty is simulated by merging the representations of two consecutive scenes, and pairing them with only one of the utterances.

Utt1: But it is very boring.

Utt2: Are we going to play now?

Utt3: Did you get fed up … ?

Automatically prepared data Fazly, Alishahi et al., 2010:

use semantic symbols that correspond to the words in the utterance. Referential uncertainty is simulated by merging the representations of two consecutive scenes, and pairing them with only one of the utterances.

Utt1: But it is very boring. Scene1: [but, it, is, very, boring, are,

we, going, to, play, now]

Utt2: Are we going to play now?

Utt3: Did you get fed up … ?

Statistical measures • Measuring statistical properties:

1. Scene stability, or the overlap between every pair of consecutive scenes:

2. Noise, or the normalized number of words that refer to something not present in the scene:

3. Referential certainty, or the normalized number of the scene elements that are referred to in the utterance:

1

11),(

+

++ ∪

∩=

ii

iiii SS

SSSSoverlap

i

iiii U

SUUUnoise

∩−=)(

i

iii S

SUScertainty

∩=)( Si - current scene

Ui - current utterance Si+1 - next following scene

Statistical measures

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Scene stability Noise Referential certainty

manual

automatic

The hybrid approach A framework that uses a small data sample as an input and generates a

meaningful stream of adult-child interaction.

Context: puzzle, duck, bin, ball, frog Turn Agent Action Utterance

1. Adult play puzzle — 2. Child play duck babbling 3. Adult point puzzle Duck fits here. 4. Child touch bin babbling 5. Adult play puzzle Yes?

The hybrid approach

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Scene stability Noise Referential certainty

manual

automatic

generated

The hybrid approach

• The hybrid approach – generating the data based on a small manually annotated sample – provides better data. So how does it work?

The hybrid approach

• The hybrid approach – generating the data based on a small manually annotated sample – provides better data. So how does it work?

• Based on co-occurrence frequencies. If two items co-occur often, they must be related, e.g.:

— Adults react on children’s babbling and actions.

— Utterances often accompany actions.

— Objects are associated with certain actions.

A manipulate book FROG. CROAK-CROAK

C close book [babbling]

A open book CROAK-CROAK

Improved algorithm

• Manual system of dependencies using information from n previous feature values.

1. Processing.

2. Generation.

# Who? Looks where? Does what? To what? Says what?

1. Adult child point book FROG. CROAK-CROAK

2. Child car play car [babbling]

3. Adult book open book CROAK-CROAK

Improved algorithm: processing

ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling

# Who? Looks where? Does what? To what? Says what?

1. Adult child point book FROG. CROAK-CROAK

2. Child car play car [babbling]

3. Adult book open book CROAK-CROAK

Improved algorithm: processing

ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling ADULT gazeA: book

Improved algorithm: processing

ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling ADULT gazeA: book

Count (gazeA (n+1) = book| gazeA (n) = child)

Count (gazeA (n+1) = book| actionA (n) = point) …

Improved algorithm: processing

ADULT gazeA: child actionA: point argument1A: book argument2A: ⌀ utteranceA: FROG. CROAK-CROAK CHILD gazeC: car actionC: play argument1C: car argument2C: ⌀ utteranceC: babbling ADULT gazeA: book actionA: open

Improved algorithm: processing

Adult child point book FROG. CROAK-

CROAK

Child car play car [babbling]

Adult book open book CROAK-CROAK

Improved algorithm: processing

Adult child point book FROG. CROAK-

CROAK

Child car play car [babbling]

Adult book open book CROAK-CROAK

Improved algorithm: processing

Adult child point book FROG. CROAK-

CROAK

Child car play car [babbling]

Adult book open book CROAK-CROAK

Improved algorithm: processing

ACTIONC= GAZEA =

play point open

book 1 4 0

child 7 2 0

car 0 5 3

Improved algorithm: generating

Features: {gazeA, actionA, object1A, object2A, utteranceA, gazeC, actionC, object1C, object2C, utteranceC}

A. Assume the features are independent?

B. Markov chain with memory m = 10?

C. Make an assumption that each feature depends on some features, but not on the other ones?

∏∈

==featuresF

iini

valueFvalueFP )|(

),...,,|( 10102211 −−−−−− ==== nnnnnnn vFvFvFvalueFP

Improved algorithm: generating

A distribution of values: book: 0.025 car: 0.005 child: 0.01 … So we can sample a value using the probabilities as weights.

Conclusions & future work

• Data generated using the hybrid approach have their statistical properties closer to those of naturalistic data.

• The algorithm can be improved using automatic collection of implicit statistical information and transforming it into transitional probabilities.

• We need to find an optimal way to represent the relations between the features: - which distribution to use? - assign weights? - replace sparse features like UTTERANCE with their categories? It means more manual work.

Questions?