Download - Machine Learning Based Roboticslisa/seminaires/28-02-2007.pdf · 98),Strohmann and Grudic (NIPS 2002); Strohmann, Belitski, Grudic and DeCoste(NIPS 2003); Bohte, Breitenbach and Grudic

3/13/2007 University of Colorado, ML Based Robotics

1

Machine Learning Based Robotics

Greg GrudicUniversity of Colorado at Boulder

[email protected]/~grudic

Joint work with Jane Mulligan (CU)


2

Students in the “Intelligence in Action Lab”

Long Term LearningMike Procopio

Semi-Supervised Learning and ClusteringAdam BatesAvleen Singh BijralMarkus Breitenbach

Sparse (fast) Classification/Regression ModelsAbhishek JaiantilalThomas Strohmann (Google)

Learning to Plan in Image SpaceMike OtteScott Richardson

Model Ensembles for Classification/RegressionSam Reid

Reinforcement LearningBen Pearre

Computer VisionSoumya GhoshWei XuJaeheon JeongKris Nuttycombe


3

Outline

Motivation

Our application domainML Based Robotics

ML Approaches

Conclusion and Future Work


4

Grudic’s Research “Focus” to about 2005

Classification and RegressionGrudic and Lawrence (IJCAI-97); Grudic and Lawrence (IROS-98),Strohmann and Grudic (NIPS 2002); Strohmann, Belitski, Grudic and DeCoste (NIPS 2003); Bohte, Breitenbach and Grudic (ICML 2004)

Semi-Supervised Learning, Clustering and ManifoldsBreitenbach and Grudic (ICML 2005); Grudic and Mulligan (RSS 2005); Grudic and Mulligan (RSS 2005); Grudic and Mulligan (2005); Grudic and Mulligan (RSS 2006); Bijral, Breitenbach, Grudic (AISTATS 2007)

Reinforcement LearningGrudic and Ungar (ICML 2000); Grudic and Ungar (AAAI 2000); Grudic and Ungar (IJCAI 01); Grudic and Ungar (NIPS 2001); Grudic and Ungar (2004);

RoboticsGrudic and Lawrence. IEEE Trans. Rob. and Aut. (1993); Grudic, Kumar and Ungar (IROS 2003)


5

Grudic’s Research Focus after 2005


Why? I believe that if ML is to address the real problems of AI, it must begin by DIRECTLY addressing a real, unsolved AI problem.


6

What Type of Robotics?

“Vision-based autonomous navigation in unstructured outdoor environments”

This is the main research focus of the DARPA LAGR program. Includes 8 teams

GaTech, Netscale (NYU), SRI, NIST, API, JPL, UPenn, CU

The problem of navigating between 2 GPS waypoints (more than a few hundred metresapart) in unstructured outdoor environments is unsolved!


7

How is it Currently Done?

Crusher and, more recently, PerceptTOR


8

Why is this an Unsolved Problem?

Such Robotic tasks are characterized by a high dimensional input space that represents the world mediated by robot sensors (vision, sonar data, etc).

The robot experiences millions of sensor readings at many frames per second, which must be processed and acted upon in real time.

The key open questions are: What information must be extracted from sensors? and, How can the robot use this information to act appropriately in the world?


9

Why a Machine Learning?

Machine Learning techniques offer powerful tools to model complex real world situations and produce coherent behaviorMany of the fundamental goals of Machine Learning are also those of Robotics

Establishes a synergy between the two fields that can serve as a catalyst for advancing theory and practice in both.


10

Efforts Taken So Far

NIPS 2005 Workshop on Machine Learning Based Robotics in Unstructured Environments. (Grudic and Mulligan Co-Organizers)NIPS 2006 Workshop on Learning Applied to Ground Robots: Sensing and Locomotion. (Grudic, Jackel and Mulligan Organizers)2006 Special issue on Machine Learning Based Robotics in Unstructured Environments, in Journal of Field Robotics. Mulligan and Grudic, Guest Editors.


11

What About the DARPA Grand Challenge?

Autonomous Navigation in the Desert over a 132 mile course.

5 Teams succeeded!http://www.darpa.mil/grandchallenge05/gcorg/index.html

This was a monumental achievement in autonomous robotics

HOWEVER: This was not an unstructured environment!GPS waypoints were carefully chosen, sometimes less than a metre apart.


12

Environments that DARPA Grand Challenge winners would find challenging:


13


Defn: Autonomous robot controllers are learned directly from observations of sensor readings and actuator command results

This implies:No hand crafted controllers

Minimal human bias on what controllers should do

The Ultimate Goal: A Complete ML Based Robotics System


14

Our Main Platform (LAGR)WAAS GPS

mounted on

a collapsiblemount

E-StopIR Rangefinder

Bumper with

dual switches Differential drive

Dual stereo cameras


15

The Problem Domain

Image

Confident Stereo

Estimates

Traversable

Gaussian Kernel SVM Classification

PROBLEM: The entire image is being classified indiscriminatingly!

Image

50 100 150 200 250 300

50

100

150

200

Stereo Labelling

50 100 150 200 250 300

50

100

150

200

SVM Guassian Classification

50 100 150 200 250 300

50

100

150

200

Non-Traversable

Traversable Classification

Non-Traversable Classification

ML


16

A Major Problem with ML Today:Most ML Algorithms Predict Blindly!

AutoAuto/Tiger

Classifier

Prediction

Auto/Tiger

Classifier

PredictionTiger

Should PredictI don’t know!

Auto/Tiger

Classifier

Prediction

Learn a classifier between Automobiles and Tigers:


17

Blind Prediction is a Major Problem For ML Based AI

No single model will always be correctAn AI system must predict when it’s current model set is NOT AppropriateThese predictions are needed because we need a framework to trigger the learning of new models which appropriately account for new situations

We are exploring density based classifiers to address this problem


18

Image 1

The Problem Domain

Image 1

Stereo Labeled Example of

a Path

Use stereo to identify a confident patch

of traversable ground directly in front of

the robot

Image 1: Poly Mahalanobis

Use examples of

path to label entire

image

Image in front of Robot

Path labeled Image


19

The Problem Domain (restated)

Stereo can give accurate readings at a short range (< 10 meters?)

Take confident stereo readings of traversable terrain (paths) and project them into the rest of the image (far field)

This gives far Field Navigation Capabilities which can greatly outperform stereo alone

See JFR Special issue on “Machine Learning Based Robotics in Unstructured Envirnments”, Nov/Dec 2006.


20

Image 1

The Problem Domain (restated again)

Non-path Path

Maximize distance

between these

Goal: Find a distance metric that

efficiently discriminates path from no path


21

Euclidean Distance MetricEuclidean Distance

( ),E i j i jd = −x x x x

Light means close, dark means far. Zero distance point in blue square.

•Distance measure radiates

spherically from the reference

point

•Distance measure does not follow the structure of the data

Data points represent example

windows of paths

, d

i j∈ℜx x


22

The Mahalanobis Distance between points and is defined by:

( ) ( ) ( )1,t

M i j i j i jd C

−= − −x x x x x x

Mahalanobis Distance

jx

C

, d

i j∈ℜx x

ix

is a covariance matrix


23


0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

2D Data

x1

x 2

Euclidean Distance

Points used to estimate

the covariance matrix C

( ) ( ) ( )1,t

M i j i j i jd C

−= − −x x x x x x( ),E i j i j

d = −x x x x



24


Follows the linear structure in data

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Synthetic 2D Data 2

x1

x 2


-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

Synthetic 2D Data 1

x1

x 2



Points use to estimate

the covariance matrix


25

What About Locally Nonlinear Data?

We are not aware of an existing distance framework that efficiently follows the local nonlinear structure in high dimensional data.

In RSS 2006, we proposed the Polynomial Mahalanobis Distance Metric


26

is obtained by mapping into all of its polynomial terms of order q or less

is obtained by mapping into all of its polynomial terms of order q or less

The q-order Polynomial Mahalanobis Distance between points and is defined

by:

( ) ( ) ( )1,t

PM i j i j i jd C

−= − −z

z z z z z z

Polynomial Mahalanobis Distance

ix

Cz

iz

jx

ix

jx

jz

is a covariance matrix in q-order polynomial space

, d

i j∈ℜx x


27

Polynomial Space Mappings

, m

i j∈ℜz z

( )!

! !

d qm

d q

+=

⋅

where the number of polynomial terms is:

2, 2 5d q m= = ⇒ =

( ) ( )2 2

1 2 1 2 1 2 1 2, , , , ,x x x x x x x x= ⇒ =x z

950, 8 10d q m= = ⇒ >

Example 1:

Example 2:

Problem: Polynomial mappings are computationally prohibitive for

large d and/or q!!!

is the data dimension

is the polynomial order

d

q


28

Computationally Efficient Polynomial Mahalanobis Distance:

for Large and

Global distance mappings are prohibitiveHowever, a local neighborhood of size N < 50 can be efficiently projected into its K ≤ N principle componentsIn this K dimensional space, q=2 order mappings can be efficiently obtained (there will be (K+2)(K+1)/2 polynomial terms), giving a second order Polynomial Mahalanobis DistanceThis process can be repeated to give q=4,8,16,…Order Polynomial Mahalanobis Distance models.

d q


29


30

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

Synthetic 2D Data 2

x1

x 2

Second Order Poly Mahalanobis DistanceMahalanobis Distance Eighth Order Poly Mahalanobis Distance Thirty Second Order Poly Mahalanobis Distance

-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

Synthetic 2D Data 1

x1

x 2

Mahalanobis Distance Second Order Poly Mahalanobis Distance Forth Order Poly Mahalanobis Distance Eigth Order Poly Mahalanobis Distance


Distance Characteristics as Polynomial Order q Increases

q=1 q=2 q=4 q=8

q=1 q=2 q=8 q=32


31

Image 1 Image 1: Euclidean Image 1: Mahalanobis Image 1: Poly Mahalanobis

Image 2 Image 2: Euclidean Image 2: MahalanobisImage 2: Poly Mahalanobis

- Level of darkness is proportional to confidence in labeled path- White areas indicate no path- Path threshold chosen through cross validation

Outdoor Path LabelingImage patches used to construct models

Samples: 10 by 10 window of normalized rgb pixels (d = 300)

Euclidean Mahalanobis Poly Mahalanobis


32

Image 4Image 4: MahalanobisImage 4: Euclidean Image 4: Poly Mahalanobis

Image 6 Image 6: Euclidean Image 6: Mahalanobis Image 6: Poly Mahalanobis

Image 3 Image 3: Euclidean Image 3: MahalanobisImage 3: Poly Mahalanobis



33

Why Better Segmentation with the Polynomial Mahalanobis

Distance?

0 500 1000 1500 2000 2500 3000 35000

2000

4000

6000

8000

10000

12000

14000Euclidean

0 500 1000 1500 2000 2500 3000 35000

0.5

1

1.5

2

2.5x 10

5 Mahalanobis

0 500 1000 1500 2000 2500 3000 35000

0.5

1

1.5

2

2.5

3x 10

7 Poly Mahalanobis

Sorted Distances of image patches to the training set patch•Green is segmented path (chosen by threshold on validation data)•Red is non-path

Nonlinear local structure in data helps efficiently discriminate path from

no path through trivial threshold estimation



34

Results of an Accurate Threshold Choice:

In the following two videos you will see the results of trying to discriminate

the hay bale obstacles from traversable terrain.


35

The Polynomial Mahalanobis Distance

Efficient, but not efficient enough for real time roboticsWe want at least 10 frames a second on a 320 by 240 RGB image

We are currently formulating fast approximations


36

The Polynomial Mahalanobis Distance

Mapping data into PM space produces effective SPARSE classifiersUse two types of PM projections (basis functions)

1. PM Distance SpaceA Local Measure

2. PM high order polynomial spaceA Global Measure (essentially a polynomial coordinate frame)

This creates a very large basis function setUse a SPARSE (in number of basis functions) LINEAR Algorithm to choose the best subset (Strohmann’s Ph.D. Thesis)

Final model contains both global and local basis functions

( )1

sgnK

i i

i

y P bα−

= +

∑ x 0

iα =

( )iP x

Sparse Goal: most


37

Main Research Thrust:Long Term Learning

Learning Goal: Learn two types of ModelsTraversable Models: Maps Images to Traversable “cost” maps Non-traversable Models: Maps Images to Non-traversable “cost” maps

These are combined with stereo to give a single cost map used for navigation

Model properties:Acquire a suite of models over timeUse models learned in run 1 on all future runsEfficiently incorporate new models with old in real time during a run


38

Traversable ModelNon-traversable Model

Bright:

•High Confidence

Dark:

•Low Confidence

Non-traversable “Cost” Traversable “Cost”

Image

50 100 150 200 250 300

50

100

150

200

Non−Trav. Mask

50 100 150 200 250 300

50

100

150

200

Ground Mask

50 100 150 200 250 300

50

100

150

200


39

Long Term Learning(Models)

Approach:Learn multiple simple models

We don’t know how to learn one big model

The models are “density based” models based:Output 0 when they “don’t know”

Output > 0 (to a max of 1), when they believe that a they can make a prediction


40

Construction of a “Simple” Terrain Model

1. Gather data of terrain and non-tarrain

2. Cluster data – we use a fast, automated algorithm for clustering by ranking on manifolds (based on RSS 2005, ICML 2005)

3. For each cluster, create a (sparse) linear model for classifying terrain from non-terrain


41

Linear Models of Terrain that can predicteither “Terrain” or “I don’t know”

+++

++++

++

++

--

-

-

-

-

--

-Test points where

terrain should be predicted

Test points where

terrain should NOT be predicted

O

x

O O O

O

O

O

Ox

xx

x

x

x


42

Constructing Linear Models of Terrain that can predicteither “Terrain” or “I don’t know”

Take the terrain data and project it into the following 2D space:

1. Signed distance from hyperplane2. First principle component of terrain data when projected into the

NULL SPACE of the hyperplaneBuild a 2D histogram in this space and scale to a maximum value of one

Maximum Likelihood on validation data used to determine bin sizes

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Signed Distance to Hyperplane

Data

in 1

st

PC

A o

f N

ull

Space


43

This Technique Builds Effective“Yes, No, I don’t Know” Classifiers

The same procedure is used for Non-Terrain ModelsEffective on a variety of data sets

In preparation for a NIPS submission

Why not Platt’s SVM scaling to probabilitiesPredicts with high probability anything far away from the boudary

Why 2D histograms?1D not as discriminative (only histograms on distance to boundary)> 2D usually not enough data to build accurate histograms (we are working on this…)


44

Image

50 100 150 200 250 300

50

100

150

200

Stereo Labelling

50 100 150 200 250 300

50

100

150

200

Ground Mask

50 100 150 200 250 300

50

100

150

200

Non−Trav. Mask

50 100 150 200 250 300

50

100

150

200

Image Stereo Labeled Data

Not Ground

Ground

Low confidence

Traversable Terrain Non-Traversable Terrain

Build

2 models


45

Long Term Learning:Choosing Model Subsets (1)

The robot will learn thousands of modelsNot all are applicable to every image

Given a new sensor reading, each model outputs a value between 0 and 10 means low confidence – i.e. the model has no opinion about this part of the image1 means high confidence – i.e. the model is fairly sure it has seen something like this part of the image and can make a prediction


46

Long Term Learning:Combining Model Subsets (2)

The relevant models for each image are all applied to the image

Each model creates a single image cost map

The “Non-Traversable Terrain Models” are combined into a single “Non-Traversable Terrain” image cost

Currently – simple max over all individual costs

The “Traversable Terrain Models” are combined into a single “Traversable Terrain” image cost

Currently – simple max over all individual costs


47

Long Term Learning:Choosing Model Subsets (3)

Cannot (in real time) apply all models to an imageImage is (partially) sampled in the near field and the far field

Currently this sampling is random…

Models that disagree with reality (next slide) are not usedAll other models are ranked according to image relevance (confidence)

The overall confidence of each model is proportional to magnitude of the average output over these samples

The top N most confident models are applied to produce the final cost map

N is defined by real time needs


48

Long Term Learning:Agreeing with Reality

A “Traversable Model” agrees with reality iffWhenever its output > 1, 3D reconstruction based sampling indicates that the region is traversable

Similarly for “Non-Traversable Model” In near field, we are currently using stereo for this “sanity check”For the far field (i.e. where stereo doesn’t work) we will be

Using structure from motion for a sanity checkWe do not use “Traversable Models” and “Non-Traversable Models” that disagree with one another where no visual clues exist


49

Long Term Learning:Adding New Models

A New model is added whenever one of the following happensNo (sanity checked) models output > 0

Whenever models don’t explain realityi.e. stereo (or structure from motion) indicates a traversable region where no “Traversable Model” ouputs > 0

Same for “Non-Traversable Model”


50

Long Term Learning Challenges I

We anticipate ten’s of thousands of models

At 10 frames per second can only apply about 100 of these to an image….

This still leaves the problem of which 100 models should be applied to a image


51

Long Term Learning Challenges II

Possible solutions:Random sampling of parts of the imageBias the models according to how effective they were on the previous imagesMaybe combine many simple models into a fast bigger model while the robot sleepsWe are working on deep network models that attempt to extract higher level features that identify traversable from non-traversable


52

Loosely put: Our Deep Models look like this

First level: learns how regions of the image change as the robot movesSecond level: learns what kind of changes are associated with going from non-traversable to traversable terrain.Third level: learns how to combine second level information to find sequences of actions that keep the robot on traversable terrain

Personal communication with Fernando Pereira

Forth Level: ???

Key Concept: Each level is meant to learn a simple subset of the entire task...


53


54

Semi-Supervised Learning and Clustering Approach

These are attractive because much of the image is unlabelled by stereoCan work effectively over widely varying images (Grudic and Mulligan 2005)However, this algorithm is not suitable for real-time control

We are currently working on faster algorithms


55

Image Features Used

Appearance and Texture Are NOT enough!We are currently evaluating the following feature types

Appearance (window based)Normalized RGBColor Histograms, etc

TextureDisparity in the near fieldOptical flowStructure from motion

Every Feature is not appropriate for all envirnments. Online Feature selection for each model

Based on Stohmann’s Ph.D. thesis which addressed fast locally linear feature selection


56

50 100 150 200 250 300

50

100

150

200

50 100 150 200 250 300

50

100

150

200

Far Field Traversable Terrain Identification

We are currently addressing this bybuilding distance specific models

Structure from motion in the far field

Same Obst.


57

Conclusion

There is a wide body of evidence that Machine Learning Techniques and Theory can improve the performance of autonomous outdoor robot navigation

JFR Special Issue (Nov/Dec 2006)DARPA LAGR Phase I

However, we are still not doing Machine Learning Based Robotics!

Our autonomous controllers are largely hand selected, with ML only being added to small subsystems

It is time for Robotics and ML researchers to completely reformulate the Robotics AI problem using the theoretical framework of ML!

This will benefit BOTH communities


58

Future Work

Currently:sensing of terrain and non-terrain is all ML basedThese image cost maps are projected into the ground plane where traditional planners (A*) are used to plan paths for the robotStereo for sanity check

Within the next six months:Monocular techniques will replace stereoAll planning will be done in image space (RSS 2005)All planning will be done by learning to search for sequences of actions that get you to the goal

Sequence learning algorithms of the type used by Pereira for text and bioinformatics applications


59

Acknowledgements

Thanks to Dan Lee and Fernando Pereira for useful discussions

Current Funding SourcesDARPA “Learning Applied to Ground Robots”

NSF 0535269

NSF 0430593

Dept. of Ed/OSERS/NIDRR