Knowledge Representation: Spaces, Trees, Features · Knowledge Representation A good representation...

transcript

Knowledge Representation:Spaces, Trees, Features

Announcements

● Optional section 1: Introduction to Matlab

– Tonight, 8:00 pm

● Problem Set 1 is available

The best statistical graphic ever?

Image removed due to copyright considerations. Please see: Tufte, Edward. The Visual Display of Quantitative Information. Cheshire CT: Graphics Press, 2001. ISBN: 0961392142.

The worst statistical graphic ever ?

Image removed due to copyright considerations. Please see: Tufte, Edward. The Visual Display of Quantitative Information.Cheshire CT: Graphics Press, 2001. ISBN: 0961392142.

Knowledge Representation

● A good representation should: – be parsimonious – pick out important features – make common operations easy – make less common operations possible

Mental Representations

● Pick a domain: say animals ● Consider everything you know about that domain. ● How is all of that knowledge organized? – a list of facts? – a collection of facts and rules? – a database of statements in first-order logic?

Two Questions

1. How can a scientist figure out the structure of people's mental representations?

2. How do people acquire their representations?

World Scientist Learner

Q: How can a scientist figure out the structure of people's mental representations?

A: Ask them for similarity ratings

Scientist Learner

objects

Q: How do people acquire their mental representations?

A: They build them from raw features – features that come for free

Learner World

objects

raw features

Outline

● Spatial Representations – Multidimensional scaling – Principal component analysis

● Tree representations – Additive trees – Hierarchical agglomerative clustering

● Feature representations – Additive clustering

Multidimensional scaling (MDS)

Image removed due to copyright considerations.

Marr’s three levels

● Level 1: Computational theory – What is the goal of the computation, and what is the

logic by which it is carried out? ● Level 2: Representation and algorithm

– How is information represented and processed to achieve the computational goal?

● Level 3: Hardware implementation – How is the computation realized in physical or

biological hardware?

MDS: Computational Theory

: distance in a low-dimensional space

: human dissimilarity ratings

● Classical MDS:

● Metric MDS:

● Non-metric MDS: rank order of the should match rank order of the

MDS: Computational Theory

● Cost function

– Classical MDS: cost =

MDS: Algorithm

● Minimize the cost function using standard methods (solve an eigenproblem if possible: if not use gradient-based methods)

Choosing the dimensionality

● Elbow method

Colours

Phonemes

What MDS achieves

● Sometimes discovers meaningful dimensions ● Are the dimensions qualitatively new ? Does

MDS solve Fodor's problem?

What MDS doesn't achieve

● Solution (usually) invariant under rotation of the axes

● The algorithm doesn't know what the axes mean. We look at the low-dimensional plots and find meaning in them.

ideonomy.mit.edu

Image removed due to copyright considerations. Please See: http://ideonomy.mit.edu/slides/16things.html

PatrickGunkel

_______________________________________________

Two Questions

1. How can a scientist figure out the structure of people's mental representations?

2. How do people acquire their representations?

World Scientist Learner

Principal Components Analysis (PCA)

objects objects

new raw features features

● Computational Theory – find a low-dimensional subspace that preserves as

much of the variance as possible ● Algorithm – based on the Singular Value Decomposition (SVD)

objectsSVD raw = features

objects new features

SVDobjects

features =

= objects

0.8 0.5

PCA and MDS

PCA on a raw Classical MDS onfeature matrix = Euclidean

distances between feature vectors

Applications: Politicspoliticians

roll-call votes

≈ co-ordinates of politicians in ideology space

CongressUS Senate, 2003

(Stephen Weis) Courtesy of Stephen Weis. Used with permission.

US Senate, 1990

(Stephen Weis) Courtesy of Stephen Weis. Used with permission.

Applications: Personalitypeople

answers to questions on personality test

≈ co-ordinates of people in personality space

● The Big 5 – Openness – Conscientiousness – Extraversion – Agreeableness

– Neuroticism

Applications: Face Recognition

images

pixel ≈values

co-ordinates of images in face space

Original faces

Principal Components

Face Recognition

● PCA has been discussed as a model of human perception – not just an engineering solution – Hancock, Burton and Bruce (1996). Face processing:

human perception and principal components analysis

Latent Semantic Analysis (LSA)documents

logword ≈ frequencies co-ordinates of

documents insemantic space

● New documents can be located in semantic space ● Similarity between documents is the angle

between their vectors in semantic space

LSA: Applications

● Essay grading

● Synonym test

LSA as a cognitive theory

● Do brains really carry out SVD?

– Irrelevant: the proposal is at the level of computational theory

● A solution to Fodor's problem?

– Are the dimensions that LSA finds really new?

Compare with the Bruner reading

“striped and three borders”: conjunctive concept

Figure by MIT OCW.

● Bruner Reading: – Raw features: texture (striped, black)

shape (cross, circle)number

– Disjunctive and conjunctive combinations allowed

● LSA: – Raw features: words – Linear combinations of raw features allowed

(new dimensions are linear combinations of the raw features)

LSA as a cognitive theory

● Do brains really carry out SVD?

– Irrelevant: the proposal is at the level of computational theory

● A solution to Fodor's problem?

– Are the dimensions that LSA finds really new?

● What the heck do the dimensions even mean?

Non-Negative Matrix Factorizationobjects

raw ≈PCA: features

entries can be negative

objects

raw ≈NMF: features (Lee and Seung)

entries must be non-negative

Dimensions found by NMF

Please see: Lee, D. D., and H. S. Seung. "Algorithms for non-negative matrix factorization." Advances in Neural Information Processing 13. Proc. NIPS*2000, MIT Press, 2001.

See also Tom Griffiths' work on finding topics in text

______________________________________________________________

Outline

● Feature representations – Additive clustering

Tree Representations

● Library of Congress system ● Q335.R86

Q: ScienceQ1-Q385: General Science

Q300-336: CyberneticsQ331-Q335: Artificial Intelligence

Q335.R86: Russell & Norvig, AIMA

5-year-old’s 7-year-old’s ontology ontology

Keil, Frank C. Concepts, Kinds, and Cognitive Development. Cambridge, MA: MIT Press, 1989.______________________________________

● We find hierarchical representations very natural. Why ?

● Hierarchical representations are not always obvious. The work of Linnaeus was a real breakthrough.

Today:

● Trees with objects located only at leaf nodes

ADDTREE (Sattath and Tversky)● Input: a dissimilarity matrix ● Output: an unrooted binary tree

● Computational Theory

Algorithm:

: distance in a tree

: human dissimilarity ratings

– search the space of trees using heuristics

ADDTREE: example

ADDTREE

● Tree-distance is a metric

● Can think of a tree as a space with an unusual kind of geometry

Hierarchical Clustering

● Input: a dissimilarity matrix ● Output: a rooted binary tree ● Computational Theory – ? (but see Kamvar, Klein and Manning, 2002)

● Algorithm: – Begin with one group per object – Merge the two closest groups – Continue until only one group remains

Hierarchical Clustering

How close are two groups?

Single-link clustering Complete-link clustering

Average-link clustering

Hierarchical Clustering: Example

Tree-building as feature discovery

cetaceanprimate

Outline

● Feature representations WARNING: – Additive clustering additive clustering is

not about trees

Additive Clustering● Representation: an object is a collection of

discrete features – eg Elephant = {grey, wrinkly, has_trunk,

is_animal ...}

● Additive clustering is about discovering features from similarity data

Additive clustering

sij wk fi k f jk k

sij : similarity of stimuli i , j wk : weight of cluster k fik : membership of stimulus i in cluster k

(1 if stimulus i in cluster k , 0 otherwise)

Equivalent to similarity as a weighted sum of common features (Tversky, 1977).

Additive clustering

objects feats objects

wk fik f jksij ≈ k

Additive clustering for the integers 0-9:

sij wk fik f jkk

Rank Weight Stimuli in cluster Interpretation

0 1 2 3 4 5 6 7 8 9

1 .444 * * * powers of two

2 .345 * * * small numbers

3 .331 * * * multiples of three

4 .291 * * * * large numbers

5 .255 * * * * * middle numbers

6 .216 * * * * * odd numbers

7 .214 * * * * smallish numbers

8 .172 * * * * * largish numbers

General Questions

● We've seen several types of representations. How do you pick the right representation for a domain? – related to the statistical problem of model selection

– to be discussed later

Next Week

● More complex representations

Knowledge Representation: Spaces, Trees, Features · Knowledge Representation A good representation...

Documents