Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of...

transcript

Machine Learning in 25 minutes or

lessAnd why the HotOS folks should care...

Terran LaneDept. of Computer ScienceUniversity of New Mexico

terran@cs.unm.edu

Machine learning is the study of algorithms or systems that improve their performance in response to experience.

The core ML problem

The core ML problemThe W

- Network- CPU- Program memory footprint- User activity- Multi-process performance

The core ML problem

- Latency; bandwidth- Branches taken; cache misses- Memory allocs; object age- Keystroke rates; recent commands- Process throughput; cache activity; synch delays

The core ML problem

rsModel

prediction

rsModel

- Compression/redundancy rates- Branch prediction- Object lifetime- Legitimate/hostile- Normal/abnormal

The core ML problem

rsModel

The core ML problem

rsModel

Performancemeasure

assessment

The core ML problem

rsModel

Performancemeasure

L(ŷ,y)

assessment

rsModel

Performancemeasure

L(ŷ,y)

assessment

- accuracy (0/1 loss)- squared error- time-to-response

The core ML problem

rsModel

Performancemeasure

assessment

control

The core ML problem

rsModel

Performancemeasure

assessment

response

The core ML problem

rsModel

Performancemeasure

assessment

L(ŷ,X’)

rsModel

Performancemeasure

assessment

L(ŷ,X’)

- Correctness- Stability- Robustness- Total system performance (throughput, latency, etc.)

The core ML problem

rsModel

Performancemeasure

assessment

rsModel

Performancemeasure

assessment

- ???- Do you like the model?- Does it make sense?- Does it make you feel warm and fuzzy?

rsModel

Performancemeasure

assessment

The ML job:find this...

rsModel

Performancemeasure

assessment

The ML job:find this...

... so thatthis is as good

as possible.

Types of learning•Supervised

•Reinforcement learning

•Unsupervised

•Special cases:

•Semi-supervised

•Anomaly detection

•Behavioral cloning

•etc...

Supervised Learning•Characteristics:

•Measure features/sensor values ⇒ X

•Want to predict system “output”, y

•Have some source of example (X,y) pairs

•System, human-labeling, etc.

•Have a well-defined performance criterion

Example sup. learners•Discriminative: only produces classifier

•Decision tree: fast; comprehensible models

•Support vector machine: high dim data; accurate

•Nearest-neighbor / k-nn: low-dim data; slow

•Neural net: special case of SVM

•Generative: produces complete probability model

•Naive Bayes: very simple; surprisingly accurate

•Bayesian network: powerful; descriptive; accurate

•Markov random field: closely related to BNs

•Meta-learners/ensemble methods: sets of models

•Boosting

•Bagging

•Winnow

Key assumption #1

The train/test data reflect the same data

distribution that will be experienced when the

learned model is embedded in

performance system.•System not changing over time

•Model doesn’t affect behavior of system

Key assumption #2

All data points are statistically

independent.

•No linkage between “adjacent”/“successive” points

•No other process that is affecting data generation

Reinforcement learning•Characteristics:

•Measure features of system ⇒ X

•Want to control sys. -- model outputs are “knobs”

•Can interact with system/simulation

•Have performance measure that recognizes “good” system behavior

•Don’t need to know “correct” control actions

Key criterion•Are the sensor readings enough to completely

characterize state of the system?

•Knowing X tells you everything relevant

•Yes:

•“Fully observable”

•Learning optimal performance fairly tractable (*)

•No (multiple system states produce same X):

•“Partially observable”

•Learning barely satisfactory performance incredibly difficult (PSPACE-complete. Or worse.)

RL: The good news•It does everything that traditional control

doesn’t!

•Stochasticity ok

•Don’t need a model

•Don’t need linearity

•Discrete time ok

•No messy ODEs or z transforms!

•Delay ok

RL: The bad news•Low dimensions

•Discrete variables/features

•Need to know state space

•Convergence can be slow

•Glacial

•Optimal control can be intractable

Example RL•Fully observable systems

•Q-learning

•SARSA

•Dyna

•Partially observable

•Reinforce

•Utile distinction memories

•Policy gradient methods

Key difference #1Unlike supervised learning...

Distinct data points can be temporally

correlated.•Key parameter: how much history is

necessary to characterize the system?

•Markov order

•1 time unit? 2? All of them?

Key difference #2Unlike supervised learning...

Model is expected to influence behavior of

system•It’s a good thing...

References (partial)•General:

•Mitchell, Machine Learning, McGraw-Hill, 1997.

•Duda, Hart, & Stork, Pattern Classification, Wiley, 2001.

•Hastie, Tibshirani, & Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2001.

•Software (general; mostly supervised):

•Weka: Data Mining Software in Java.http://www.cs.waikato.ac.nz/ml/weka/

References (partial)•Decision trees:

•Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann, 1993.

•Brieman, Classification & Regression Trees (CART), Wadsworth, 1983.

•Support vector machines:

•Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, Data Mining and Knowledge Discovery, 2(2), 1998.

•Software: SVMlighthttp://svmlight.joachims.org/

References (partial)•Reinforcement learning

•Sutton & Barto, Reinforcment Learning: An Introduction, MIT Press, 1998.

•Kaelbling, Littman, & Moore, “Reinforcement Learning: A Survey”, Journal of Artificial Intelligence Research, 4, 1996.

•Kaelbling, Littman, & Cassandra, “Planning and Acting in Partially Observable Stochastic Domains”, Artificial Intelligence, 101,1998.

Thank you!

Questions?

ML keywords•Learning

•Adaptive

•Self-tuning

•State estimation

•Parameter estimation

•Data mining

•Computational statistics

•Predictive modeling

•Pattern recognition

•etc...

The Learning LoopThe W

rsModel

Performancemeasure

L(ŷ,y)

assessment

Generate“training”

Learningmodule

Performancemeasure

The training process•Gather large set of “training data”

•Dtrain

=[ (X1,y

1), (X

2), ... , (X

•Also large set of “testing” (eval; holdout) data

•Deval

=[ (X1,y

1), ... , (X

•Apply learner to train to get model

•f() = learn(Dtrain

•Evaluate results on test set

•[ ŷtest

] = f(Xtest

•assessment = L(ŷtest

,ytest

Machine Learning in 25 minutes or less And why the HotOS folks should care... Terran Lane Dept. of...

Documents