Date post: | 19-Dec-2015 |
Category: |
Documents |
Upload: | barnard-horton |
View: | 213 times |
Download: | 1 times |
Hand In
• It is online.• Web board forum for Matlab questions• Comments and corrections very welcome. I
will upload new versions as we go along. Currently we are at version 3
• Your data is coming. We might change it over time.
Impossibility of Learning!x1 x2 x3 f(x
)0 0 0 11 0 0 00 1 0 11 1 0 10 0 1 01 0 1 ?0 1 1 ?1 1 1 ?
What is f?
There are 256 potential functions 8 of them has in sample error 0
Assumptions are needed
No Free Lunch"All models are wrong, but some models are useful.” George Box
Machine Learning has many different models and algorithms
Assumptions that works well in one domain may fail in another
There is no single best model that works best for all problems (No Free Lunch Theorem)
Probabilistic ApproachRepeat N times independently
Sample mean: ν #heads/N
Sample:h,h,h,t,t,h,t,t,h
μ is unknown
Hoeffdings Inequality
Sample mean is probably approximately correct PAC
Classification ConnectionTesting a Hypothesis
Fixed Hypothesis Unknown Target
is probability of picking x such that f(x) ≠ h(x)is probability of picking x such that f(x) = h(x)
μ is the sum of the probability of all the points X where hypothesis is wrong
Probability Distribution over x
Sample Mean - true error rate μ
Learning?
• Only Verification not Learning
• For finite hypothesis sets we used union bound
• Make sure is close to and minimize
Error Functions
h(x)/f(x) Lying True
Est. Lying 0
Est. True 0
Walmart. Discount for a given person Error Function
h(x)/f(x) Lying True
Est. Lying 0
Est. True 0
CIA Access (Friday bar stock)Error Function
10001
1000 1
Point being. Depends on application
Final Diagram
Unknown Target Unknown Probability Distribution P(x)
Learn Importance
P(y | x)
Data Set
Learning Algorithm
Hypothesis Set
Final Hypothesis
Error Measure e
Today
• We are still only talking classification• Test Sets
• Work towards learning with infinite size hypothesis spaces for classification– Reinvestigate Union Bound– Dichotomies– Break Points
The Test Set
Fixed hypothesis h, N independent data points, and any ε>0
• Split your data into two parts D-train,D-test• Train on D-train and select hypothesis h• Test h on D-test, error • Apply Hoeffding bound to
Test Set
• Strong Bound: 1000 points then with 98% probability, in sample error will be within 5% of out of sample error
• Unbiased– Just as likely to better than worse
• Problem lose data for training• If Error is high it is not a help that it will also be
high in practice• Can NOT be used to select h (contamination)
Learning
Pick a tolerance (risk) δ of failing you can accept
Set RHS equal to δ and solve for ε =
With Probability 1-δ
Generalization Bound
Why we minimize in sample error.
Union Bound Learning
Learning algorithm pick hypothesis hl
P(hl is bad) is less than the probability that some hypothesis is bad
We did not subtract overlapping events!!!
Hypotheses seem correlatedh1
h2
if h1 is bad (poor generalization) then probably so is h2
Hope to improve union bound result
Change
Goal
• Replace M with something like effective number of hypotheses
• General bound. E.g. independent, target function and input distribution
• Simple would be nice.
Look at finite point sets
Dichotomy bit string of length N
Fixed set of N points X = (x1,..,xN) Hypothesis set
Each gives a dichotomy
Capturing the “expressiveness” of the hypothesis set on X
How Many Different Dichotomies do we get? At Most
Example 1: Positive Rays1-Dimensional input space (points on the real line)
a
Only Change When a moves to different interval
Example 2: Intervals1-Dimensional input space (points on the real line)
a1 a2
a1,a2 in separate parts + Put in same
Goal Continued
Imagine we can replace M with growth function
RHS is dropping exponentially fast in N
If Growth function is a polynomial in N then RHS still drops exponentially in N
Generalization Bound
Bright Idea. Prove Growth function is polynomial in N
Prove we can replace M with growth function
Bounding Growth Function
• Might be hard to compute
• Instead of computing the exact value
• Prove that it is bounded by a polynomial
Shattering and Break Point
If
then we say that shatters (x1,…,xN)
If no data set of size K can be shattered by then K is a break point for
If K is a break point for then so is all numbers larger than K? Why?
2D Linear Classification3 Points on a line
For 2D Linear ClassificationHypothesis set4 is a break point
Break Points and Growth Function
If has a break point then the growth function is polynomial (needs proof)
If not then it is not! By definition of break point:
Break Point Game
Has Break Point 2
x1 x2
0 00 11 01 1
x1 x2 x3
0 0 10 0 0
0 1 00 1 1
Impossible for
1 0 01 0 11 1 01 1 1
Row 1,2,3,4
Row 6,5,2,1
Row 7,5,3,2
Row 8,5,3,2
Definition:B(n,k) is the maximal number of dichotomiespossible on N points such that no subset of k points can be shattered by the dichotomies.
If no data set of size K can be shattered by then K is a break point for
More general than hypothesis sets
for any with break point k
Computing B(n,k) – Boundary CasesCannot shatter set of size 1. There is no way of picking dichotomies that gives different classes for a point. There is only one dichotomy since a different dichotomy would give different class for at least one point
There is only one point, this only 2 dichotomies are possible
Recursion
Consider the first n_1 points, there are α+β different (S2 sets are identical here)They can still at most shatter k points, e.g. B(N-1,k) is an upper bound
Consider the first n_1 points in S2. If they can shatter k-1 points we can extend with last point where we have both combinations for all dichotomies. This givesk points we can shatter a contradiction.