7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using...

Post on 21-Dec-2015

220 views 1 download

Tags:

transcript

7/17/2002 Greg Grudic: Nonparametric Modeling

1

High Dimensional Nonparametric Modeling Using Two-Dimensional

Polynomial Cascades

Greg GrudicUniversity of Colorado, Boulder

grudic@cs.colorado.eduwww.cs.colorado.edu/~grudic

7/17/2002 Greg Grudic: Nonparametric Modeling

2

Outline

• Applications of Very High Dimension Nonparametric Modeling

• Define the problem domain

• One solution: Polynomial Cascade Algorithm

• Conclusion

7/17/2002 Greg Grudic: Nonparametric Modeling

3

Applications of High Dimensional Non-Parametric Models

1. Human-to-Robot Skill Transfer (ICRA96)2. Mobile Robot Localization (IROS98)3. Strength Prediction of Boards4. Defect Classification in Lumber5. Activity recognition for the cognitively

impairedThe same PC algorithm is used in all of

these applications (with no parameter tuning)

7/17/2002 Greg Grudic: Nonparametric Modeling

4

Human-to-Robot Skill Transfer (ICRA96)

• Problem: Human demonstrates a task via teleoperation.– Object locate and approach task.– 1024 raw pixel inputs and 2 actuator outputs.

• Learning Data: 4 demonstrations of task sequence.– 2000 to 5000 learning examples (~2 to 5 min).

• Learning Time: ~5 min. on SPARC 20.• Model Size / Evaluation Speed: < 500 Kb, ~5 Hz.• Autonomous control of robot using model:

– No failures in 30 random trials.

7/17/2002 Greg Grudic: Nonparametric Modeling

5

Mobile Robot Localization (IROS98)

• Goal: Use on-board camera images to obtain position/orientation

• Workspace: 6 x 5 meters in a research lab.• Desired accuracy: 0.2 meters in pos. and 10

deg or.• Inputs: 3 raw pixel images (160 by 120) =>

19,200 inputs.• Learning Data: 2000 image inputs and robot

pos./or.• Learning time: ~2 hours on SPARC 20.• Model Size/Evaluation speed: ~2.0 MB, 7 Hz.

7/17/2002 Greg Grudic: Nonparametric Modeling

6

Strength Prediction of Boards

• Goal: Predict the strength of a board (2x4) using nondestructive scans (Slope of Grain, Elasticity, XRay).

• Current Wood Processing Industry Standard: correlation of 0.5 to 0.65.

• The Learning Data: Scanned 300 boards and broke them each in 3 to 4 different places.

• Model Inputs: ~5000 statistical features.• Learning time and model size: ~40 min. / ~1

MB.• Model Accuracy (correlation): = 0.8.

7/17/2002 Greg Grudic: Nonparametric Modeling

7

Defect Classification in Lumber

• Problem: Classify board defects using “images”.• < 10 ms per classification (Speed ~12 ft / sec).• ~20 classes - 4 types of knots, pitch pockets,

etc.• Many attempted solutions: analytical methods,

learning methods, etc.• Model Inputs: > 1000.• Learning Examples: > 1000.• Model Accuracy: > 92%

7/17/2002 Greg Grudic: Nonparametric Modeling

8

Activity Recognition for the Cognitively Impaired

• Goal: – To keep track of what activity a person is doing using cameras

• e.g. which room is a person in; what are they doing; what have they completed?

– Minimal engineering of environment

• Soln: Attach a video camera to the person as tasks are accomplished– Label camera images accordingly– Build a model that classifies the images

• ~4000 raw pixels as inputs

• Preliminary results: success rate of 90% for identifying 4 different tasks

7/17/2002 Greg Grudic: Nonparametric Modeling

9

Problem Domain Characteristics

• thousands of relevant input variables– each contributing a small but significant

amount to the final model

• no subset of these variables can adequately describe the desired function

• the relevant variables are confounded by thousands of irrelevant variables

7/17/2002 Greg Grudic: Nonparametric Modeling

10

Why is this a difficult domain?

• Very large input space!

• Problem is intrinsically nonparametric– Don’t know which inputs are significant– Don’t know an optimal model structure

• Problems are in general nonlinear

7/17/2002 Greg Grudic: Nonparametric Modeling

11

Constructing Models from Data

• Given input/output examples of some phenomenon (regression/classification function) :

• Construct an approximate mapping such that, for some unseen :

N

( )y f= x

( ) ( ){ }1 1, ,..., ,N Ny yx x

( )ˆˆ newy f= x

fnewx

7/17/2002 Greg Grudic: Nonparametric Modeling

12

Polynomial Cascade Algorithm: Conceptual Motivation (IJCAI 97)

• Problem # 1: Simultaneous construction of model infeasible.– Solution: use low dimensional projections (building

blocks).• simplest approach: 2 dimensional:

• Problem # 2: Finding the best low dimensional projection infeasible.– Soln: Don’t find the best - use selection criteria which

are independent of dimension.• simplest approach: random building block selection.

( ),lg u v

7/17/2002 Greg Grudic: Nonparametric Modeling

13

PC Algorithm: Conceptual Motivation (continued)

• Problem # 3: Low dimensional projections tend to be flat (i.e. ).– Soln: Subdivide the input space.

• simplest approach: random subdivision (bootstrap samples).

( ),lg u v C=

7/17/2002 Greg Grudic: Nonparametric Modeling

14

Polynomial Cascade Structure

. . .

. . .

0kx

1kx

2kx

Lkx

1a

2a

La

1g

2g

Lg

yS

7/17/2002 Greg Grudic: Nonparametric Modeling

15

Main PC Algorithm Characteristics

1. Building blocks (3rd order polynomials)

2. added one at a time, in order

3. Random (repeated) order of inputs:

4. constructed using a bootstrap sample

( ) 00 01 10 112 2 2 2

20 02 21 123 3

30 03

,lg u v a a v a u a uv

a u a v a u v a uv

a u a v

= + + + +

+ + + +

+( ),lg u v

( )1,..., dx x( ),lg u v

7/17/2002 Greg Grudic: Nonparametric Modeling

16

PC Algorithm:

STEP 1: Initialize algorithm:

– Learning data divided into training set and validation set

– Random order of inputs

STEP 2: Construct new section: (Multiple levels)– Use bootstrap sample to fit – Set to the normalized inverse MSE of on

training set– Stop when error on validation set stops decreasing

( ),lg u v

la ( ),lg u v

7/17/2002 Greg Grudic: Nonparametric Modeling

17

PC Algorithm: (continued)

STEP 3: Prune section: Prune back to the block which has smallest error on the validation set

STEP 4: Update learning outputs. Replace outputs with residual errors:

STEP 4: Check stopping condition: GOTO STEP 2 if further error reduction is possible, otherwise STOP

s

i il i

y y ga=

¬ - å

7/17/2002 Greg Grudic: Nonparametric Modeling

18

Why does PC work?

• Over fitting avoided via appropriate injection of randomness: i.e like Random Forests (Breiman, 1999)– Bootstrap sampling– Random order of inputs

• Irrelevant inputs not excluded from cascade– Treated as noise and averaged out

• No explicit variable selection is used

7/17/2002 Greg Grudic: Nonparametric Modeling

19

Why does PC work? (continued)

• Produces stable high dimensional models– Projections onto 2 dimensional structures

• Low dimensional projections are unlikely to be flat– Bootstrap sampling avoids – PC algorithm effectively deals with parity

problems of greater than 2 dimensions• e.g. 10 bit parity problem where , for

all levels, without random sampling

( ),lg u v Cº

( ),lg u v Cº

7/17/2002 Greg Grudic: Nonparametric Modeling

20

PC Effective on Low Dimensional Problems (surprise?)

• Does as well or better than most algorithms on low dimensional regression problems (IJCAI97)

• Produces competitive models without the need for parameter tuning or kernel selection

• HOWEVER:– Models are not sparse!

7/17/2002 Greg Grudic: Nonparametric Modeling

21

Theoretical Results

1. PC’s are universal approximators

2. Conditions for convergence to zero error:• Uncorrelated errors from level to level

• Similar to bagging and random forests (Breiman)

3. Rate of convergence (to some local error minimum), as a function of the number of learning examples, is independent of the dimension of the input space

7/17/2002 Greg Grudic: Nonparametric Modeling

22

Conclusion

• There are many application areas for very high dimension, nonlinear, nonparametric modeling algorithms!

• Cascaded low dimensional polynomials produce effective nonparametric models

• Polynomial Cascades are most effective in problem domains characterized by– thousands of relevant input variables

• each contributing a small but significant amount to the final model

– no subset of these variables can adequately describe the desired function

– the relevant variables are confounded by thousands of irrelevant variables