7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using...

transcript

7/17/2002 Greg Grudic: Nonparametric Modeling

High Dimensional Nonparametric Modeling Using Two-Dimensional

Polynomial Cascades

Greg GrudicUniversity of Colorado, Boulder

grudic@cs.colorado.eduwww.cs.colorado.edu/~grudic

Outline

• Applications of Very High Dimension Nonparametric Modeling

• Define the problem domain

• One solution: Polynomial Cascade Algorithm

• Conclusion

Applications of High Dimensional Non-Parametric Models

1. Human-to-Robot Skill Transfer (ICRA96)2. Mobile Robot Localization (IROS98)3. Strength Prediction of Boards4. Defect Classification in Lumber5. Activity recognition for the cognitively

impairedThe same PC algorithm is used in all of

these applications (with no parameter tuning)

Human-to-Robot Skill Transfer (ICRA96)

• Problem: Human demonstrates a task via teleoperation.– Object locate and approach task.– 1024 raw pixel inputs and 2 actuator outputs.

• Learning Data: 4 demonstrations of task sequence.– 2000 to 5000 learning examples (~2 to 5 min).

• Learning Time: ~5 min. on SPARC 20.• Model Size / Evaluation Speed: < 500 Kb, ~5 Hz.• Autonomous control of robot using model:

– No failures in 30 random trials.

Mobile Robot Localization (IROS98)

• Goal: Use on-board camera images to obtain position/orientation

• Workspace: 6 x 5 meters in a research lab.• Desired accuracy: 0.2 meters in pos. and 10

deg or.• Inputs: 3 raw pixel images (160 by 120) =>

19,200 inputs.• Learning Data: 2000 image inputs and robot

pos./or.• Learning time: ~2 hours on SPARC 20.• Model Size/Evaluation speed: ~2.0 MB, 7 Hz.

Strength Prediction of Boards

• Goal: Predict the strength of a board (2x4) using nondestructive scans (Slope of Grain, Elasticity, XRay).

• Current Wood Processing Industry Standard: correlation of 0.5 to 0.65.

• The Learning Data: Scanned 300 boards and broke them each in 3 to 4 different places.

• Model Inputs: ~5000 statistical features.• Learning time and model size: ~40 min. / ~1

MB.• Model Accuracy (correlation): = 0.8.

Defect Classification in Lumber

• Problem: Classify board defects using “images”.• < 10 ms per classification (Speed ~12 ft / sec).• ~20 classes - 4 types of knots, pitch pockets,

etc.• Many attempted solutions: analytical methods,

learning methods, etc.• Model Inputs: > 1000.• Learning Examples: > 1000.• Model Accuracy: > 92%

Activity Recognition for the Cognitively Impaired

• Goal: – To keep track of what activity a person is doing using cameras

• e.g. which room is a person in; what are they doing; what have they completed?

– Minimal engineering of environment

• Soln: Attach a video camera to the person as tasks are accomplished– Label camera images accordingly– Build a model that classifies the images

• ~4000 raw pixels as inputs

• Preliminary results: success rate of 90% for identifying 4 different tasks

Problem Domain Characteristics

• thousands of relevant input variables– each contributing a small but significant

amount to the final model

• no subset of these variables can adequately describe the desired function

• the relevant variables are confounded by thousands of irrelevant variables

Why is this a difficult domain?

• Very large input space!

• Problem is intrinsically nonparametric– Don’t know which inputs are significant– Don’t know an optimal model structure

• Problems are in general nonlinear

Constructing Models from Data

• Given input/output examples of some phenomenon (regression/classification function) :

• Construct an approximate mapping such that, for some unseen :

( )y f= x

( ) ( ){ }1 1, ,..., ,N Ny yx x

( )ˆˆ newy f= x

Polynomial Cascade Algorithm: Conceptual Motivation (IJCAI 97)

• Problem # 1: Simultaneous construction of model infeasible.– Solution: use low dimensional projections (building

blocks).• simplest approach: 2 dimensional:

• Problem # 2: Finding the best low dimensional projection infeasible.– Soln: Don’t find the best - use selection criteria which

are independent of dimension.• simplest approach: random building block selection.

( ),lg u v

PC Algorithm: Conceptual Motivation (continued)

• Problem # 3: Low dimensional projections tend to be flat (i.e. ).– Soln: Subdivide the input space.

• simplest approach: random subdivision (bootstrap samples).

( ),lg u v C=

Polynomial Cascade Structure

Main PC Algorithm Characteristics

1. Building blocks (3rd order polynomials)

2. added one at a time, in order

3. Random (repeated) order of inputs:

4. constructed using a bootstrap sample

( ) 00 01 10 112 2 2 2

20 02 21 123 3

,lg u v a a v a u a uv

a u a v a u v a uv

a u a v

= + + + +

+ + + +

+( ),lg u v

( )1,..., dx x( ),lg u v

PC Algorithm:

STEP 1: Initialize algorithm:

– Learning data divided into training set and validation set

– Random order of inputs

STEP 2: Construct new section: (Multiple levels)– Use bootstrap sample to fit – Set to the normalized inverse MSE of on

training set– Stop when error on validation set stops decreasing

( ),lg u v

la ( ),lg u v

PC Algorithm: (continued)

STEP 3: Prune section: Prune back to the block which has smallest error on the validation set

STEP 4: Update learning outputs. Replace outputs with residual errors:

STEP 4: Check stopping condition: GOTO STEP 2 if further error reduction is possible, otherwise STOP

i il i

y y ga=

¬ - å

Why does PC work?

• Over fitting avoided via appropriate injection of randomness: i.e like Random Forests (Breiman, 1999)– Bootstrap sampling– Random order of inputs

• Irrelevant inputs not excluded from cascade– Treated as noise and averaged out

• No explicit variable selection is used

Why does PC work? (continued)

• Produces stable high dimensional models– Projections onto 2 dimensional structures

• Low dimensional projections are unlikely to be flat– Bootstrap sampling avoids – PC algorithm effectively deals with parity

problems of greater than 2 dimensions• e.g. 10 bit parity problem where , for

all levels, without random sampling

( ),lg u v Cº

PC Effective on Low Dimensional Problems (surprise?)

• Does as well or better than most algorithms on low dimensional regression problems (IJCAI97)

• Produces competitive models without the need for parameter tuning or kernel selection

• HOWEVER:– Models are not sparse!

Theoretical Results

1. PC’s are universal approximators

2. Conditions for convergence to zero error:• Uncorrelated errors from level to level

• Similar to bagging and random forests (Breiman)

3. Rate of convergence (to some local error minimum), as a function of the number of learning examples, is independent of the dimension of the input space

Conclusion

• There are many application areas for very high dimension, nonlinear, nonparametric modeling algorithms!

• Cascaded low dimensional polynomials produce effective nonparametric models

• Polynomial Cascades are most effective in problem domains characterized by– thousands of relevant input variables

• each contributing a small but significant amount to the final model

– no subset of these variables can adequately describe the desired function

– the relevant variables are confounded by thousands of irrelevant variables

7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using...

Documents