The purpose of this talk
• Not to develop robust understanding of ML algorithms nor to derive them
• But to provide sufficient basis to do applied predictive modeling
• Our goal is to do prediction modeling, building accurate models by utilizing statistical principles, feature engineering, model tuning, applying appropriate ML and do error analysis
Preliminary outline
• Model purpose – for prediction, for explanation• The basic study design of Machine learning– Model Representation– Classification vs. Regression Problems– Supervised vs. Unsupervised Learning
• Model Assessment & Selection– Interplay between Bias, Variance & Complexity– Cross Validation: The wrong/correct way of doing it
• The Single Algorithm Hypothesis & Deep Learning
Ex. Models for Explanation
Wong, P. T. P. (2014). Viktor Frankl’s meaning seeking model and positive psychology.
Coursera Course, Machine learning by Andrew Ng
Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning
Receptive field in Humans
Preliminary outline
• Model purpose – for prediction, for explanation• The basic study design of Machine learning– Model Representation– Classification vs. Regression Problems– Supervised vs. Unsupervised Learning
• Model Assessment & Selection– Interplay between Bias, Variance & Complexity– Cross Validation: The wrong/correct way of doing it
• The Single Algorithm Hypothesis & Deep Learning
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Independent VariablesPredictors
Features
Dependent VariablesResponses
Preliminary outline
• Model purpose – for prediction, for explanation• The basic study design of Machine learning– Model Representation– Classification vs. Regression Problems– Supervised vs. Unsupervised Learning
• Model Assessment & Selection– Interplay between Bias, Variance & Complexity– Cross Validation: The wrong/correct way of doing it
• The Single Algorithm Hypothesis & Deep Learning
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
To recap: some definitions
• Variance – the amount which the prediction would change if we
estimated it using a different training data set• Bias– the error that is introduced by approximating a real-
life problem– more flexible methods result in less bias, but more
variance• Flexibility = degrees of freedom ~ Complexity– Can be modified by regularization parameter– or increase/reduce number of features
Study design – training/test sets
An Introduction to Statistical Learning, Ch 5 Resampling Methods
In practice – training/CV/test set
• Training set– used to fit the models
• Validation set – used to estimate prediction error for model selection
• Test set – used for assessment of the generalization error of the final chosen
model.
The Elements of Statistical Learning ch7. Model Assessment and Selection
Coursera Course, Machine learning by Andrew Ng
參數來源 θ (x(i), y(i))
Training error Training set Training set
CV error Training set CV set
Coursera Course, Machine learning by Andrew Ng
Coursera Course, Machine learning by Andrew Ng
The Bias-Variance Trade-Off
An Introduction to Statistical Learning, Ch 5 Resampling Methods
Cross validation – single split
An Introduction to Statistical Learning, Ch 5 Resampling Methods
Cross validation – n = 10 folds
An Introduction to Statistical Learning, Ch 5 Resampling Methods
K-fold Cross validation ensures better estimation of test error
Compare these two CV methods, what’s different and what’s wrong ?
1. Screen the predictors– find a subset of “good”
predictors that show fairly strong (univariate) correlation with the class labels
2. Build a multivariate classifier– Using just this subset of
predictors3. Apply cross-validation
– to estimate the unknown tuning parameters and to estimate the prediction error of the final model.
1. Divide the samples into K cross-validation folds (groups) at random
2. For each fold k = 1,2,...,Ka. Find a subset of “good”
predictors that show fairly strong (univariate) correlation with the class labels, using all of the samples except those in fold k.
b. Using just this subset of predictors, build a multivariate classifier, using all of the samples except those in fold k.
c. Use the classifier to predict the class labels for the samples in fold k.
The predictors chosen by the left method have an unfair advantage
• they were chosen in step (1) on the basis of all of the samples.
• Leaving samples out after the variables have been selected does not correctly mimic the application of the classifier to a completely independent test set
• these predictors “have already seen” the left out samples.
The Elements of Statistical Learning ch7. Model Assessment and Selection
Recap principles from Statistics – K-fold CV is a form of random sampling
Coursera Course, Data Analysis and Statistical Inference by Dr. Mine Çetinkaya-Rundel
ML algorithm performance is dependent on the underlying data
An Introduction to Statistical Learning, Ch 8 Tree methods
More issues to be covered in next talk
• Remedies for Severe Class Imbalance• Measuring Predictor Importance• Factors That Can Affect Model Performance
Preliminary outline
• Model purpose – for prediction, for explanation• The basic study design of Machine learning– Model Representation– Classification vs. Regression Problems– Supervised vs. Unsupervised Learning
• Model Assessment & Selection– Interplay between Bias, Variance & Complexity– Cross Validation: The wrong/correct way of doing it
• The Single Algorithm Hypothesis & Deep Learning
Back then, the prevailing wisdom
• MIT's Marvin Minsky - a "Society of Mind”– To achieve AI, it was believed, engineers would
have to build and combine thousands of individual computing units or agents.
– One group of agents, or module, would handle vision, another language, and so on…
The Single Algorithm Hypothesis
• Human intelligence stems from a single learning algorithm– In 1978 paper by Vernon Mountcastle: An Organizing
Principle for Cerebral Function – Jeff Hawkins “Memory-prediction framework”
• Origin– Neuroplasticity during brain development– Potential of other cortical areas to cover previous lost
function after brain injury (eg. stroke)
Deep Learning - 1• Single Algorithm– neural networks to mimic human brain behavior• A basic layer of artificial neurons that can detect simple
things like the edges of a particular shape• The next layer could then piece together these edges
to identify the larger shape• Then the shapes could be strung together to
understand an object
• Key: the software does all this on its own– give the system a lot of data, so it can discover by
itself what some of the concepts in the world are
The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI, Wired
Deep Learning - 2• This approach is inspired by how scientists believe that
humans learn. – The algorithm didn’t know the word “cat” — Ng had to
supply that — but over time, it learned to identify the furry creatures we know as cats, all on its own.
– As babies, we watch our environments and start to understand the structure of objects we encounter, but until a parent tells us what it is, we can’t put a name to it.
• Building High-level Features Using Large Scale Unsupervised Learning
The Man Behind the Google Brain: Andrew Ng and the Quest for the New AI, WiredBuilding High-level Features Using Large Scale Unsupervised Learning, QV Le, et al
References
Stanford Andrew Ng course