Data Mining in Aeronautics, Science, and Exploration Systems
2007 Conference
June 26-27, 2007
Computer History Museum Mountain View, California, USA
Sponsored by
NASA Engineering and Safety Center Science Mission Directorate
Aeronautics Research Mission Directorate - IVHM
Data Mining in Aeronautics, Science, and Exploration Systems 2007 Conference
Computer History Museum Mountain View, CA June 26-27, 2007
Numerous disciplines, including aeronautics, physical sciences, and space exploration, have benefited from recent advances in data and text mining, machine learning, and statistics. The Data Mining in Aeronautics, Science, and Exploration Systems (DMASES) 2007 conference provides the data mining community with an opportunity to share these advances across the larger communities of engineers and scientists working in aeronautics, aerospace, and science. This single-track conference features in-depth lectures, tutorials, discussion, and a poster session.
Conference Organizers Session Chairs Ashok N. Srivastava, Ph.D. Kevin H. Knuth, Ph.D. (Sciences) Intelligent Systems Division Department of Physics NASA Ames Research Center State University of New York, Albany
Dawn M. McIntosh Michael D. New, Capt., Ph.D. (Aeronautics) Intelligent Systems Division Delta Airlines, Inc. NASA Ames Research Center
Bob Beil Anindya Ghoshal, Ph.D. (Exp. Systems) Systems Engineering Office United Technologies Research Center NASA Engineering and Safety Center United Technologies Corp.
Conference Agenda
Tuesday, June 26
8:00 AM REGISTRATION8:30 AM Morning Announcements/Introductions 8:35 AM Mining Future Datascapes - Srivastava/NASA Ames Research Center 9:15 AM Ascent Summary Data Analysis Tool for Shuttle Wing Leading Edge
Impact Detection - McIntosh/NASA Ames Research Center Exploration Systems Session
9:35 AM Distributed Mobility Management for Target Tracking in Mobile Sensor Networks - Chakrabarty/Duke University
10:20 AM * break * 10:45 AM A Structural Neural System for Data Mining and Anomaly Detection -
Schulz/University of Cincinnati11:25 AM Current Trends in Performance Prognostics Using Integrated Simulation
and Sensors - Baca/Sandia National Laboratories12:25 PM * Poster Session/Lunch *
Sciences Session 2:00 PM Problem Solving Strategies: Sampling & Heuristics - Knuth/State
University of New York, Albany2:20 PM Making the Sky Searchable: Rapid Indexing for Automated Astrometry -
Roweis/Google2:30 PM Bayesian Analysis of the Cosmic Microwave Background - Jewell/NASA
Jet Propulsion Laboratory3:00 PM Efficient & Stable Gaussian Process Calculations - Foster/San Jose
State University3:30 PM * break * 4:00 PM Understanding Large-Scale Structure in Earth Science Remote Sensing
Data Sets - Braverman/NASA Jet Propulsion Laboratory4:30 PM Data-driven Modeling for Understanding Climate-Vegetation
Interactions - Nemani/NASA Ames Research Center5:00 PM END
Wednesday, June 27
8:00 AM REGISTRATION8:30 AM Morning Announcements 8:35 AM Tutorial, session I - Principles of Bayesian Methods - Sansó/University
of California, Santa Cruz 10:00 AM * break * 10:30 AM Tutorial, session II - Principles of Bayesian Methods - Sansó/University
of California, Santa Cruz 12:30 PM * Collaboration Discussions & Networking/Lunch *
Aeronautics Session 1:30 PM National Aeronautics Research & Development Policy – Overview and
Outreach - Schlickenmaier/NASA Headquarters2:00 PM Applying Knowledge Representation to Runway Incursion -
Wilczynski/University of Southern California3:00 PM The Role of Data Mining in Aviation Safety Decision Making -
McVenes/Air Line Pilots Association, International3:30 PM * break *4:00 PM Sifting NOAA Archived ACARS Data for Wind Variation to Improve
Traffic Efficiency - Ren/Georgia Institute of Technology4:30 PM Data & Text Mining in Boeing - Kao/Boeing Phantom Works5:00 PM Concluding Remarks - Srivastava 5:10 PM END
Invited Presentations
Conference Coordinator Presentations
Mining Future Datascapes Ashok Srivastava, NASA Ames Research Center
Ascent Summary Data Analysis Tool for Shuttle Wing Leading Edge Impact DetectionDawn McIntosh, NASA Ames Research Center
NASA Engineering and Safety Center Data Mining and Trending Working GroupBob Beil, NASA Engineering and Safety Center
Tuesday, June 26
Distributed Mobility Management for Target Tracking in Mobile Sensor NetworksKrishnendu Chakrabarty, Duke University
A Structural Neural System for Data Mining and Anomaly DetectionMark Schulz, University of Cincinnati
Current Trends in Performance Prognostics Using Integrated Simulation and SensorsThomas J. Baca, Sandia National Laboratories
Problem Solving Strategies: Sampling and HeuristicsKevin Knuth, SUNY Albany
Making the Sky Searchable: Rapid Indexing for Automated AstronomySam Roweis, Google
Bayesian Analysis of the Cosmic Microwave BackgroundJeff Jewell, NASA Jet Propulsion Laboratory
Efficient and Stable Gaussian Process CalculationsLeslie Foster, San Jose State University
Understanding Large-Scale Structure in Earth Science Remote Sensing Data SetsAmy Braverman, NASA Jet Propulsion Laboratory
Data-Driven Modeling for Understanding Climate-Vegetation InterfacesRamakrishna Nemani, NASA Ames Research Center
Efficient & Stable Gaussian Process Calculations
Leslie Foster San Jose State University
The Gaussian process technique is one popular approach for analyzing and making predictions related to large data sets. However the traditional Gaussian process approach requires solving a system of linear equations that, in many cases, is so large that it is not practical to solve in a reasonable amount of time. We describe how low-rank approximations can be used to solve these equations approximately. The resulting algorithm is fast, accurate, numerically stable, and general. We illustrate the application of the algorithm to the prediction of redshifts using broad spectrum measurements of the light from galaxies.
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
EFFICIENT AND STABLE GAUSSIAN PROCESS
CALCULATIONS
Leslie Foster, Nabeela Aijaz, Michael Hurley, Apolo Luis,Joel Rinsky, Chandrika Satyavolu, Alex Waagen (team
leader)
MathematicsSan Jose State University
June 26, 2007, DMASES 2007
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEEFFICIENT AND STABLE GAUSSIAN PROCESS CALCULATIONS
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
ABSTRACT
The Gaussian process technique is one popular approach foranalyzing and making predictions related to large data sets.However the traditional Gaussian process approach requiressolving a system of linear equations that, in many cases, is solarge that it is not practical to solve in a reasonable amount oftime. We describe how low rank approximations can be used tosolve these equations approximately. The resulting algorithm isfast, accurate, numerically stable and general. We illustrate theapplication of the algorithm to the prediction of redshifts usingbroad spectrum measurements of the light from galaxies.
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
OUTLINE
I. The Problem and Background
II. Low Rank Approximation
III. Numerical Stability and Rank Selection
IV. Results
V. Conclusions
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
PREDICTION AND ESTIMATION
Training Data:
X – data matrix of observations – n × d
y – vector of target data – n × 1
Testing Data:
X ∗ – matrix of new observations – n∗ × d
Goals:
predict y∗ corresponding to X ∗
estimate y corresponding to X
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
Approaches for prediction with large data sets:
Traditional regression
Neural networks
Support Vector Machines
E-model
. . .
Gaussian Processes
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
GAUSSIAN PROCESS SOLUTION
Form covariance matrix K (n × n),cross covariance matrix K ∗ (n∗ × n) andselect parameter λ
predict y∗ using
y∗ = K ∗(λ2I + K )−1y
(λ2I + K ) is large – for example
180000 × 180000
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
COVARIANCE FUNCTIONS AND MATRICES
Definition: A covariance function k(x , x ′) isthe measure of covariance between inputpoints x and x’.
covariance matrix (SPD): Kij = k(xi , xj)
Examples: Polynomial, SquaredExponential, Neural Network, RationalQuadratic, Matern Class, . . .
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
COMPUTATIONAL CHALLENGES
Memory: Storing covariance matrix – O(n2)
Time: Solving linear system – O(n3)
Numerical stability: accurate calculations.
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
APPLICATION: REDSHIFT CALCULATION
Indicates that an object is moving away fromyouA redshift is the change in wavelengthdivided by the initial wavelength
For example,the sound from
this train is shifted and
changes pitch when movingaway
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
APPLICATION: REDSHIFT CALCULATION
Scientists want to determine the position ofgalaxies in the universe.
Useful for understanding the structure of theuniverse.
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
APPLICATION: REDSHIFT CALCULATION
Five photometric observations for eachgalaxy denoted U,G,R,I,Z
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
APPLICATION: REDSHIFT CALCULATION
We have 180,045 examples with a knownU,G,R,I,Z and redshift.
The goal is to be able to predict a newredshift given new U,G,R,I,Z data from anew galaxy.
Testing set: 20,229 galaxies
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
BACKGROUND: LEAST SQUARES PROBLEMS
Given:n × m matrix A, n ≥ mn × 1 vector yn∗ × m matrix A∗
Solvemin ||y − Ax ||
Estimate y : y = Ax
Predict y∗: y∗ = A∗x
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
BACKGROUND: NORMAL EQUATIONS
x = (AT A)−1AT y
Advantage: Fast
Disadvantage:cond(AT A) = cond2(A)
relative error in x ∝ cond2(A) alwayspotential numerical instability
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
BACKGROUND: ORTHOGONAL (QR) FACTORIZATION
Form A = QR whereQ is n × m with orthonormal columnsR is m × m right triangular
x = R−1QT y
Disadvantages: can be slower, morememory (in Matlab)
Advantage:numerically stablecan be more accurate
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
LOW RANK APPROXIMATION
K =
m n − mm
n − m
(K11 K12
K21 K22
)=
m n − mn
(K1 K2
)
K ∗ =m n − m
n∗ (K ∗
1 K ∗2
)K ∼= K ≡ K1K−1
11 K T1
K ∗ ∼= K ∗ ≡ K ∗1 K−1
11 K T1
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
LOW RANK APPROXIMATION: SR FORMULA
Recall y∗ = K ∗(λ2I + K )−1y
Replace K with K and K ∗ with K ∗ so that
y∗ ∼= K ∗(λ2I + K )−1y =
. . . . . . . . .
y∗ ∼= K ∗1 (λ2K11 + K T
1 K1)−1K T
1 y
Subset of Regressors Formula [Wahba,1990]
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
COMPUTATIONAL CHALLENGES OVERCOME
y∗ ∼= K ∗1 (λ2K11 + K T
1 K1)−1K T
1 y
Memory: Storing covariance matrix – O(nm)
Time: Solving linear system – O(nm2)
Numerical stability: ???.
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
SR FORMULA AND LEAST SQUARES
In SR formula consider special case λ = 0
y∗ = K ∗1 (K T
1 K1)−1K T
1 y
Exactly normal equations solution to theleast squares prediction problem:min ||y − K1x || and y∗ = K ∗
1 x
Note: can be easily extended for λ �= 0
Potential numerical instability
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
CURES FOR NUMERICAL INSTABILITY
1. Use stable technique for least squaresproblem
QR factorization
"V method"
2. Make K1 as well conditioned as possible
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
THE V METHOD
Factor K1 = VV T11 where V is n × m and V11
is m × m lower triangular
y∗ = K ∗1 V−T
11 (λ2I + V T V )−1V T y
V is a rescaling of a well conditioned matrix
method is numerically stable
can be faster and need less memory
related to [Peters and Wilkinson, 1970],[Wahba, 1990, p. 136]
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
COLUMN SELECTION
Use partial Cholesky factorization withpivoting to form V
selects appropriate columns for K1
K1 will be well conditioned: cond(K1) isO(condition of optimal low rankapproximation)
[Higham, 2002, pp. 196-208]
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
CHOICE OF RANK
For least squares problems there areefficient techniques to drop columns [Bjorck,1996, p. 133]
The techniques can be easily adapted
Solve GP problem with rank mapproximation
Small additional cost to determine theaccuracy of all lower rank k approximation,k = 1, . . . , m
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
COMPUTING TIMES
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
COMPUTING TIMES
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
ESTIMATING A METHOD’S ACCURACY: BOOTSTRAP
Bootstrap: standard statistical resamplingtechnique
Generate multiple (100) samples to testmethods
Determine reliability, error bounds
Stable methods have smaller range of error
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
BOOTSTRAP RESAMPLING, n = 180045, m = 100
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
BOOTSTRAP RESAMPLING: V METHOD
V method with pivoting, n = 36009, m = 1000
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
BOOTSTRAP RESAMPLING: V + SR METHOD
V method with pivoting and SR method
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
BOOTSTRAP RESAMPLING: V, WITH AND W/O PIVOTING
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
RMSE ERROR VS. RANK
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
RMSE ERROR VS. NUMBER OF GALAXIES
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
GP VS. ALTERNATIVE METHODS
Way and Srivastava, 2006:
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
GP VS. ALTERNATIVE METHODS
Way and Srivastava, 2006 + our results:
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
SUMMARY OF RESULTS
Code solves linear algebra issues in theGaussian process approach:
Fast - O(nm2), m << n
Accurate - good predictions
Stable - bootstrap error curves flat
General - works for any kernel
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
FURTHER WORK
Outliers
hyperparameters using low rankapproximation (we used minimize from[Rasmussen and William, 2006])
additional covariance functions
lower bound on errors (ex: for redshift .02?)
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
REFERENCES
A. Bjork, Numerical Methods for Least Squares Problems,SIAM, 1996.
N. Higham, Accuracy and Stability of NumericalAlgorithms, SIAM, 2002.
G. Peters and J. Wilkinson, Comput. J. (13), pp. 309-316,1970.
C. Rasmussen and C. Williams, Gaussian Processes forMachine Learning, MIT Press, 2006.
G. Wahba, Spline Models for Observation Data, SIAM,1990.
M. Way and A. Srivastava, Astrophysical Journal (647), pp.102-115, 2006.
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA
OUTLINESTHE PROBLEM AND BACKGROUND
LOW RANK APPROXIMATIONNUMERICAL STABILITY AND RANK SELECTION
RESULTSCONCLUSIONS
ACKNOWLEDGEMENT
We would like to thank the Woodward Fund forthe financial support and the following peoplefor their guidance.
Drs. Michael Way, Ashok Srivastava, TimLee, Paul Gazis (NASA scientists)
Dr. Tim Hsu (CAMCOS director)
Drs. Bem Cayco, Wasin So and Steve Crunk(SJSU faculty)
LESLIE FOSTER, NABEELA AIJAZ, MICHAEL HURLEY, APOLO LUIS, JOEL RINSKY, CHANDRIKA SATYAVOLU, ALEX WAAGEN (TEAM LEADEDMASES 2007, JUNE 26-27, 2007, MOUNTAIN VIEW, CA