Date post: | 08-Jan-2017 |
Category: |
Data & Analytics |
Upload: | jacquelyn-victoria |
View: | 44 times |
Download: | 0 times |
Logistic RegressionJacquelyn Victoria & Tamer Wahba
1
Slide OwnershipJacquelyn Victoria - 3 to 9Tamer Wahba - 10 to 15
2
Regression Analysis +
Classification
How can we predict a nominal class using regression analysis?
Consider a binary class:
Each instance x is a vector of feature values
Our output values or class labels are restricted to 0 or 1, i.e. f(x) {0, 1}∈
We need an h(x) where: 0 < h(x) < 1
We need a function which exhibits this behavior
3
Logistic Functions Sigmoid Function σ(x)
Asymptotes at y = 1 and y = 0
Easy to specify threshold (σ(0) = .5)
Results are P(y=1)
As a result:
Where θ is a vector of weights
4
Cost FunctionNeed to find hθ(x) that is a logistic
function that represents our data
Need to find θ to fit our data
-log(1-x)-log(x)
5
Gradient Descent
In order to find the minimum, we can use the partial derivative of J(θ)
do {
}until θ converges
Where α is the learning rate (almost always between 0 and 1, .1-.3 usually a good range)
6
Maximum Likelihood Estimation
7
do {
}until θ converges
Can also be calculated using:
Iteratively Reweighted Least Squares
Multinomial data uses Softmax Regression
Interpreting hypothesis
8
Recall that σ(0) = .5 and that hθ(x) = σ(θTx)
x1
x2
Interpreting hθ
I want to create a model to give me the probability that I will pass a test given how many hours I have studied
Hours 0.50 0.75 1.00 1.25 1.50 1.75 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 4.00 4.25 4.50 4.75 5.00 5.50
Pass 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1
Using this generated model, calculate my probability of passing given I have studied 3 hours
P(passing| study time = 3) = .61
9source
Logistic Regression
Compared to Other
Classifiers
Naive Bayes
Support Vector Machines
Decision Trees
10
vs Decision TreeAssumptions
DT: decision boundaries parallel to axes
LR: one smooth boundary
Decision trees can be used when there are multiple decision boundaries
11
Feature Weights
NB: each set independently depending on class
LR: together such that decision function tends to be high for positive classes and low for negative classes
Correlated features have no effect on logistic regression
vs Naive Bayes
12
vs Support Vector Machine
13
Both attempt to find hyperplane separating training samples
SVM: find the solution with maximum margin
LR: find any solution that separates the instances
SVM is a hard classified while LR is probabilistic
AdvantagesWorks well with diagonal decision boundaries
Does not give undue weight to correlated features
Probabilistic outcomes
14
Requires large sample size for stable results
Disadvantages
Use CasesCategorical outcomes
Large sample data
Minimal preprocessing
15
For more info...
Helpful links to go into more depth with Logistic Regression
Stanford Open Course (Logit regression section)
Logit Regression Tutorial (exercises in MATLAB)
Logit Regression Tutorial (no code)
How to use Logit Regression in Python
How to use Logit Regression in R
How to use Logit Regression in Java using Weka
16