3/13/2007 University of Colorado, ML Based Robotics
1
Machine Learning Based Robotics
Greg GrudicUniversity of Colorado at Boulder
[email protected]/~grudic
Joint work with Jane Mulligan (CU)
3/13/2007 University of Colorado, ML Based Robotics
2
Students in the “Intelligence in Action Lab”
Long Term LearningMike Procopio
Semi-Supervised Learning and ClusteringAdam BatesAvleen Singh BijralMarkus Breitenbach
Sparse (fast) Classification/Regression ModelsAbhishek JaiantilalThomas Strohmann (Google)
Learning to Plan in Image SpaceMike OtteScott Richardson
Model Ensembles for Classification/RegressionSam Reid
Reinforcement LearningBen Pearre
Computer VisionSoumya GhoshWei XuJaeheon JeongKris Nuttycombe
3/13/2007 University of Colorado, ML Based Robotics
3
Outline
Motivation
Our application domainML Based Robotics
ML Approaches
Conclusion and Future Work
3/13/2007 University of Colorado, ML Based Robotics
4
Grudic’s Research “Focus” to about 2005
Classification and RegressionGrudic and Lawrence (IJCAI-97); Grudic and Lawrence (IROS-98),Strohmann and Grudic (NIPS 2002); Strohmann, Belitski, Grudic and DeCoste (NIPS 2003); Bohte, Breitenbach and Grudic (ICML 2004)
Semi-Supervised Learning, Clustering and ManifoldsBreitenbach and Grudic (ICML 2005); Grudic and Mulligan (RSS 2005); Grudic and Mulligan (RSS 2005); Grudic and Mulligan (2005); Grudic and Mulligan (RSS 2006); Bijral, Breitenbach, Grudic (AISTATS 2007)
Reinforcement LearningGrudic and Ungar (ICML 2000); Grudic and Ungar (AAAI 2000); Grudic and Ungar (IJCAI 01); Grudic and Ungar (NIPS 2001); Grudic and Ungar (2004);
RoboticsGrudic and Lawrence. IEEE Trans. Rob. and Aut. (1993); Grudic, Kumar and Ungar (IROS 2003)
3/13/2007 University of Colorado, ML Based Robotics
5
Grudic’s Research Focus after 2005
Machine Learning Based Robotics
Why? I believe that if ML is to address the real problems of AI, it must begin by DIRECTLY addressing a real, unsolved AI problem.
3/13/2007 University of Colorado, ML Based Robotics
6
What Type of Robotics?
“Vision-based autonomous navigation in unstructured outdoor environments”
This is the main research focus of the DARPA LAGR program. Includes 8 teams
GaTech, Netscale (NYU), SRI, NIST, API, JPL, UPenn, CU
The problem of navigating between 2 GPS waypoints (more than a few hundred metresapart) in unstructured outdoor environments is unsolved!
3/13/2007 University of Colorado, ML Based Robotics
7
How is it Currently Done?
Crusher and, more recently, PerceptTOR
3/13/2007 University of Colorado, ML Based Robotics
8
Why is this an Unsolved Problem?
Such Robotic tasks are characterized by a high dimensional input space that represents the world mediated by robot sensors (vision, sonar data, etc).
The robot experiences millions of sensor readings at many frames per second, which must be processed and acted upon in real time.
The key open questions are: What information must be extracted from sensors? and, How can the robot use this information to act appropriately in the world?
3/13/2007 University of Colorado, ML Based Robotics
9
Why a Machine Learning?
Machine Learning techniques offer powerful tools to model complex real world situations and produce coherent behaviorMany of the fundamental goals of Machine Learning are also those of Robotics
Establishes a synergy between the two fields that can serve as a catalyst for advancing theory and practice in both.
3/13/2007 University of Colorado, ML Based Robotics
10
Efforts Taken So Far
NIPS 2005 Workshop on Machine Learning Based Robotics in Unstructured Environments. (Grudic and Mulligan Co-Organizers)NIPS 2006 Workshop on Learning Applied to Ground Robots: Sensing and Locomotion. (Grudic, Jackel and Mulligan Organizers)2006 Special issue on Machine Learning Based Robotics in Unstructured Environments, in Journal of Field Robotics. Mulligan and Grudic, Guest Editors.
3/13/2007 University of Colorado, ML Based Robotics
11
What About the DARPA Grand Challenge?
Autonomous Navigation in the Desert over a 132 mile course.
5 Teams succeeded!http://www.darpa.mil/grandchallenge05/gcorg/index.html
This was a monumental achievement in autonomous robotics
HOWEVER: This was not an unstructured environment!GPS waypoints were carefully chosen, sometimes less than a metre apart.
3/13/2007 University of Colorado, ML Based Robotics
12
Environments that DARPA Grand Challenge winners would find challenging:
3/13/2007 University of Colorado, ML Based Robotics
13
Machine Learning Based Robotics
Defn: Autonomous robot controllers are learned directly from observations of sensor readings and actuator command results
This implies:No hand crafted controllers
Minimal human bias on what controllers should do
The Ultimate Goal: A Complete ML Based Robotics System
3/13/2007 University of Colorado, ML Based Robotics
14
Our Main Platform (LAGR)WAAS GPS
mounted on
a collapsiblemount
E-StopIR Rangefinder
Bumper with
dual switches Differential drive
Dual stereo cameras
3/13/2007 University of Colorado, ML Based Robotics
15
The Problem Domain
Image
Confident Stereo
Estimates
Traversable
Gaussian Kernel SVM Classification
PROBLEM: The entire image is being classified indiscriminatingly!
Image
50 100 150 200 250 300
50
100
150
200
Stereo Labelling
50 100 150 200 250 300
50
100
150
200
SVM Guassian Classification
50 100 150 200 250 300
50
100
150
200
Non-Traversable
Traversable Classification
Non-Traversable Classification
ML
3/13/2007 University of Colorado, ML Based Robotics
16
A Major Problem with ML Today:Most ML Algorithms Predict Blindly!
AutoAuto/Tiger
Classifier
Prediction
Auto/Tiger
Classifier
PredictionTiger
Should PredictI don’t know!
Auto/Tiger
Classifier
Prediction
Learn a classifier between Automobiles and Tigers:
3/13/2007 University of Colorado, ML Based Robotics
17
Blind Prediction is a Major Problem For ML Based AI
No single model will always be correctAn AI system must predict when it’s current model set is NOT AppropriateThese predictions are needed because we need a framework to trigger the learning of new models which appropriately account for new situations
We are exploring density based classifiers to address this problem
3/13/2007 University of Colorado, ML Based Robotics
18
Image 1
The Problem Domain
Image 1
Stereo Labeled Example of
a Path
Use stereo to identify a confident patch
of traversable ground directly in front of
the robot
Image 1: Poly Mahalanobis
Use examples of
path to label entire
image
Image in front of Robot
Path labeled Image
3/13/2007 University of Colorado, ML Based Robotics
19
The Problem Domain (restated)
Stereo can give accurate readings at a short range (< 10 meters?)
Take confident stereo readings of traversable terrain (paths) and project them into the rest of the image (far field)
This gives far Field Navigation Capabilities which can greatly outperform stereo alone
See JFR Special issue on “Machine Learning Based Robotics in Unstructured Envirnments”, Nov/Dec 2006.
3/13/2007 University of Colorado, ML Based Robotics
20
Image 1
The Problem Domain (restated again)
Non-path Path
Maximize distance
between these
Goal: Find a distance metric that
efficiently discriminates path from no path
3/13/2007 University of Colorado, ML Based Robotics
21
Euclidean Distance MetricEuclidean Distance
( ),E i j i jd = −x x x x
Light means close, dark means far. Zero distance point in blue square.
•Distance measure radiates
spherically from the reference
point
•Distance measure does not follow the structure of the data
Data points represent example
windows of paths
, d
i j∈ℜx x
3/13/2007 University of Colorado, ML Based Robotics
22
The Mahalanobis Distance between points and is defined by:
( ) ( ) ( )1,t
M i j i j i jd C
−= − −x x x x x x
Mahalanobis Distance
jx
C
, d
i j∈ℜx x
ix
is a covariance matrix
3/13/2007 University of Colorado, ML Based Robotics
23
Mahalanobis Distance
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
2D Data
x1
x 2
Euclidean Distance
Points used to estimate
the covariance matrix C
( ) ( ) ( )1,t
M i j i j i jd C
−= − −x x x x x x( ),E i j i j
d = −x x x x
Light means close, dark means far. Zero distance point in blue square.
3/13/2007 University of Colorado, ML Based Robotics
24
Mahalanobis Distance
Follows the linear structure in data
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Synthetic 2D Data 2
x1
x 2
Mahalanobis Distance
-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Synthetic 2D Data 1
x1
x 2
Mahalanobis Distance
Light means close, dark means far. Zero distance point in blue square.
Points use to estimate
the covariance matrix
3/13/2007 University of Colorado, ML Based Robotics
25
What About Locally Nonlinear Data?
We are not aware of an existing distance framework that efficiently follows the local nonlinear structure in high dimensional data.
In RSS 2006, we proposed the Polynomial Mahalanobis Distance Metric
3/13/2007 University of Colorado, ML Based Robotics
26
is obtained by mapping into all of its polynomial terms of order q or less
is obtained by mapping into all of its polynomial terms of order q or less
The q-order Polynomial Mahalanobis Distance between points and is defined
by:
( ) ( ) ( )1,t
PM i j i j i jd C
−= − −z
z z z z z z
Polynomial Mahalanobis Distance
ix
Cz
iz
jx
ix
jx
jz
is a covariance matrix in q-order polynomial space
, d
i j∈ℜx x
3/13/2007 University of Colorado, ML Based Robotics
27
Polynomial Space Mappings
, m
i j∈ℜz z
( )!
! !
d qm
d q
+=
⋅
where the number of polynomial terms is:
2, 2 5d q m= = ⇒ =
( ) ( )2 2
1 2 1 2 1 2 1 2, , , , ,x x x x x x x x= ⇒ =x z
950, 8 10d q m= = ⇒ >
Example 1:
Example 2:
Problem: Polynomial mappings are computationally prohibitive for
large d and/or q!!!
is the data dimension
is the polynomial order
d
q
3/13/2007 University of Colorado, ML Based Robotics
28
Computationally Efficient Polynomial Mahalanobis Distance:
for Large and
Global distance mappings are prohibitiveHowever, a local neighborhood of size N < 50 can be efficiently projected into its K ≤ N principle componentsIn this K dimensional space, q=2 order mappings can be efficiently obtained (there will be (K+2)(K+1)/2 polynomial terms), giving a second order Polynomial Mahalanobis DistanceThis process can be repeated to give q=4,8,16,…Order Polynomial Mahalanobis Distance models.
d q
3/13/2007 University of Colorado, ML Based Robotics
29
3/13/2007 University of Colorado, ML Based Robotics
30
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
Synthetic 2D Data 2
x1
x 2
Second Order Poly Mahalanobis DistanceMahalanobis Distance Eighth Order Poly Mahalanobis Distance Thirty Second Order Poly Mahalanobis Distance
-0.25 -0.2 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
Synthetic 2D Data 1
x1
x 2
Mahalanobis Distance Second Order Poly Mahalanobis Distance Forth Order Poly Mahalanobis Distance Eigth Order Poly Mahalanobis Distance
Light means close, dark means far. Zero distance point in blue square.
Distance Characteristics as Polynomial Order q Increases
q=1 q=2 q=4 q=8
q=1 q=2 q=8 q=32
3/13/2007 University of Colorado, ML Based Robotics
31
Image 1 Image 1: Euclidean Image 1: Mahalanobis Image 1: Poly Mahalanobis
Image 2 Image 2: Euclidean Image 2: MahalanobisImage 2: Poly Mahalanobis
- Level of darkness is proportional to confidence in labeled path- White areas indicate no path- Path threshold chosen through cross validation
Outdoor Path LabelingImage patches used to construct models
Samples: 10 by 10 window of normalized rgb pixels (d = 300)
Euclidean Mahalanobis Poly Mahalanobis
3/13/2007 University of Colorado, ML Based Robotics
32
Image 4Image 4: MahalanobisImage 4: Euclidean Image 4: Poly Mahalanobis
Image 6 Image 6: Euclidean Image 6: Mahalanobis Image 6: Poly Mahalanobis
Image 3 Image 3: Euclidean Image 3: MahalanobisImage 3: Poly Mahalanobis
Euclidean Mahalanobis Poly Mahalanobis
3/13/2007 University of Colorado, ML Based Robotics
33
Why Better Segmentation with the Polynomial Mahalanobis
Distance?
0 500 1000 1500 2000 2500 3000 35000
2000
4000
6000
8000
10000
12000
14000Euclidean
0 500 1000 1500 2000 2500 3000 35000
0.5
1
1.5
2
2.5x 10
5 Mahalanobis
0 500 1000 1500 2000 2500 3000 35000
0.5
1
1.5
2
2.5
3x 10
7 Poly Mahalanobis
Sorted Distances of image patches to the training set patch•Green is segmented path (chosen by threshold on validation data)•Red is non-path
Nonlinear local structure in data helps efficiently discriminate path from
no path through trivial threshold estimation
Euclidean Mahalanobis Poly Mahalanobis
3/13/2007 University of Colorado, ML Based Robotics
34
Results of an Accurate Threshold Choice:
In the following two videos you will see the results of trying to discriminate
the hay bale obstacles from traversable terrain.
3/13/2007 University of Colorado, ML Based Robotics
35
The Polynomial Mahalanobis Distance
Efficient, but not efficient enough for real time roboticsWe want at least 10 frames a second on a 320 by 240 RGB image
We are currently formulating fast approximations
3/13/2007 University of Colorado, ML Based Robotics
36
The Polynomial Mahalanobis Distance
Mapping data into PM space produces effective SPARSE classifiersUse two types of PM projections (basis functions)
1. PM Distance SpaceA Local Measure
2. PM high order polynomial spaceA Global Measure (essentially a polynomial coordinate frame)
This creates a very large basis function setUse a SPARSE (in number of basis functions) LINEAR Algorithm to choose the best subset (Strohmann’s Ph.D. Thesis)
Final model contains both global and local basis functions
( )1
sgnK
i i
i
y P bα−
= +
∑ x 0
iα =
( )iP x
Sparse Goal: most
3/13/2007 University of Colorado, ML Based Robotics
37
Main Research Thrust:Long Term Learning
Learning Goal: Learn two types of ModelsTraversable Models: Maps Images to Traversable “cost” maps Non-traversable Models: Maps Images to Non-traversable “cost” maps
These are combined with stereo to give a single cost map used for navigation
Model properties:Acquire a suite of models over timeUse models learned in run 1 on all future runsEfficiently incorporate new models with old in real time during a run
3/13/2007 University of Colorado, ML Based Robotics
38
Traversable ModelNon-traversable Model
Bright:
•High Confidence
Dark:
•Low Confidence
Non-traversable “Cost” Traversable “Cost”
Image
50 100 150 200 250 300
50
100
150
200
Non−Trav. Mask
50 100 150 200 250 300
50
100
150
200
Ground Mask
50 100 150 200 250 300
50
100
150
200
3/13/2007 University of Colorado, ML Based Robotics
39
Long Term Learning(Models)
Approach:Learn multiple simple models
We don’t know how to learn one big model
The models are “density based” models based:Output 0 when they “don’t know”
Output > 0 (to a max of 1), when they believe that a they can make a prediction
3/13/2007 University of Colorado, ML Based Robotics
40
Construction of a “Simple” Terrain Model
1. Gather data of terrain and non-tarrain
2. Cluster data – we use a fast, automated algorithm for clustering by ranking on manifolds (based on RSS 2005, ICML 2005)
3. For each cluster, create a (sparse) linear model for classifying terrain from non-terrain
3/13/2007 University of Colorado, ML Based Robotics
41
Linear Models of Terrain that can predicteither “Terrain” or “I don’t know”
+++
++++
++
++
--
-
-
-
-
--
-Test points where
terrain should be predicted
Test points where
terrain should NOT be predicted
O
x
O O O
O
O
O
Ox
xx
x
x
x
3/13/2007 University of Colorado, ML Based Robotics
42
Constructing Linear Models of Terrain that can predicteither “Terrain” or “I don’t know”
Take the terrain data and project it into the following 2D space:
1. Signed distance from hyperplane2. First principle component of terrain data when projected into the
NULL SPACE of the hyperplaneBuild a 2D histogram in this space and scale to a maximum value of one
Maximum Likelihood on validation data used to determine bin sizes
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Signed Distance to Hyperplane
Data
in 1
st
PC
A o
f N
ull
Space
3/13/2007 University of Colorado, ML Based Robotics
43
This Technique Builds Effective“Yes, No, I don’t Know” Classifiers
The same procedure is used for Non-Terrain ModelsEffective on a variety of data sets
In preparation for a NIPS submission
Why not Platt’s SVM scaling to probabilitiesPredicts with high probability anything far away from the boudary
Why 2D histograms?1D not as discriminative (only histograms on distance to boundary)> 2D usually not enough data to build accurate histograms (we are working on this…)
3/13/2007 University of Colorado, ML Based Robotics
44
Image
50 100 150 200 250 300
50
100
150
200
Stereo Labelling
50 100 150 200 250 300
50
100
150
200
Ground Mask
50 100 150 200 250 300
50
100
150
200
Non−Trav. Mask
50 100 150 200 250 300
50
100
150
200
Image Stereo Labeled Data
Not Ground
Ground
Low confidence
Traversable Terrain Non-Traversable Terrain
Build
2 models
3/13/2007 University of Colorado, ML Based Robotics
45
Long Term Learning:Choosing Model Subsets (1)
The robot will learn thousands of modelsNot all are applicable to every image
Given a new sensor reading, each model outputs a value between 0 and 10 means low confidence – i.e. the model has no opinion about this part of the image1 means high confidence – i.e. the model is fairly sure it has seen something like this part of the image and can make a prediction
3/13/2007 University of Colorado, ML Based Robotics
46
Long Term Learning:Combining Model Subsets (2)
The relevant models for each image are all applied to the image
Each model creates a single image cost map
The “Non-Traversable Terrain Models” are combined into a single “Non-Traversable Terrain” image cost
Currently – simple max over all individual costs
The “Traversable Terrain Models” are combined into a single “Traversable Terrain” image cost
Currently – simple max over all individual costs
3/13/2007 University of Colorado, ML Based Robotics
47
Long Term Learning:Choosing Model Subsets (3)
Cannot (in real time) apply all models to an imageImage is (partially) sampled in the near field and the far field
Currently this sampling is random…
Models that disagree with reality (next slide) are not usedAll other models are ranked according to image relevance (confidence)
The overall confidence of each model is proportional to magnitude of the average output over these samples
The top N most confident models are applied to produce the final cost map
N is defined by real time needs
3/13/2007 University of Colorado, ML Based Robotics
48
Long Term Learning:Agreeing with Reality
A “Traversable Model” agrees with reality iffWhenever its output > 1, 3D reconstruction based sampling indicates that the region is traversable
Similarly for “Non-Traversable Model” In near field, we are currently using stereo for this “sanity check”For the far field (i.e. where stereo doesn’t work) we will be
Using structure from motion for a sanity checkWe do not use “Traversable Models” and “Non-Traversable Models” that disagree with one another where no visual clues exist
3/13/2007 University of Colorado, ML Based Robotics
49
Long Term Learning:Adding New Models
A New model is added whenever one of the following happensNo (sanity checked) models output > 0
Whenever models don’t explain realityi.e. stereo (or structure from motion) indicates a traversable region where no “Traversable Model” ouputs > 0
Same for “Non-Traversable Model”
3/13/2007 University of Colorado, ML Based Robotics
50
Long Term Learning Challenges I
We anticipate ten’s of thousands of models
At 10 frames per second can only apply about 100 of these to an image….
This still leaves the problem of which 100 models should be applied to a image
3/13/2007 University of Colorado, ML Based Robotics
51
Long Term Learning Challenges II
Possible solutions:Random sampling of parts of the imageBias the models according to how effective they were on the previous imagesMaybe combine many simple models into a fast bigger model while the robot sleepsWe are working on deep network models that attempt to extract higher level features that identify traversable from non-traversable
3/13/2007 University of Colorado, ML Based Robotics
52
Loosely put: Our Deep Models look like this
First level: learns how regions of the image change as the robot movesSecond level: learns what kind of changes are associated with going from non-traversable to traversable terrain.Third level: learns how to combine second level information to find sequences of actions that keep the robot on traversable terrain
Personal communication with Fernando Pereira
Forth Level: ???
Key Concept: Each level is meant to learn a simple subset of the entire task...
3/13/2007 University of Colorado, ML Based Robotics
53
3/13/2007 University of Colorado, ML Based Robotics
54
Semi-Supervised Learning and Clustering Approach
These are attractive because much of the image is unlabelled by stereoCan work effectively over widely varying images (Grudic and Mulligan 2005)However, this algorithm is not suitable for real-time control
We are currently working on faster algorithms
3/13/2007 University of Colorado, ML Based Robotics
55
Image Features Used
Appearance and Texture Are NOT enough!We are currently evaluating the following feature types
Appearance (window based)Normalized RGBColor Histograms, etc
TextureDisparity in the near fieldOptical flowStructure from motion
Every Feature is not appropriate for all envirnments. Online Feature selection for each model
Based on Stohmann’s Ph.D. thesis which addressed fast locally linear feature selection
3/13/2007 University of Colorado, ML Based Robotics
56
50 100 150 200 250 300
50
100
150
200
50 100 150 200 250 300
50
100
150
200
Far Field Traversable Terrain Identification
We are currently addressing this bybuilding distance specific models
Structure from motion in the far field
Same Obst.
3/13/2007 University of Colorado, ML Based Robotics
57
Conclusion
There is a wide body of evidence that Machine Learning Techniques and Theory can improve the performance of autonomous outdoor robot navigation
JFR Special Issue (Nov/Dec 2006)DARPA LAGR Phase I
However, we are still not doing Machine Learning Based Robotics!
Our autonomous controllers are largely hand selected, with ML only being added to small subsystems
It is time for Robotics and ML researchers to completely reformulate the Robotics AI problem using the theoretical framework of ML!
This will benefit BOTH communities
3/13/2007 University of Colorado, ML Based Robotics
58
Future Work
Currently:sensing of terrain and non-terrain is all ML basedThese image cost maps are projected into the ground plane where traditional planners (A*) are used to plan paths for the robotStereo for sanity check
Within the next six months:Monocular techniques will replace stereoAll planning will be done in image space (RSS 2005)All planning will be done by learning to search for sequences of actions that get you to the goal
Sequence learning algorithms of the type used by Pereira for text and bioinformatics applications
3/13/2007 University of Colorado, ML Based Robotics
59
Acknowledgements
Thanks to Dan Lee and Fernando Pereira for useful discussions
Current Funding SourcesDARPA “Learning Applied to Ground Robots”
NSF 0535269
NSF 0430593
Dept. of Ed/OSERS/NIDRR