Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | elissa-cramer |
View: | 214 times |
Download: | 1 times |
Range Spaces
• A range space is a pair , where is a ground set, it’s elements called points and is a family of subsets of , it’s elements called ranges.
• For example:
S ( , )X R X
R
X
2( , )S X R all disks
Measure
• Let be a range space and let x be a finite subset of , then for the measure is:
( , )S X RX r R
( )r x
m rx
Motivation
• For , the estimation of , is defined by:
• We want to find a Sample , such that for every , .
r R
( )r N
s rN
N x
N xr R ( ) ( )m r s r
More Definitions
• Let be a range space. For , let be the projection of on
• If contains all subsets of , then is shattered by
• is the maximum cardinality of a shattered subset of
( , )S X R |YR r Y r R
Y XR Y
YR Y Y
R
dim ( )VC S X
Half-Spaces
• Claim: let be a set in . There are real numbers ,not all of them zero, such that
• Theorem: Let be a set in . Then, there exist 2 disjoint subsets such that
1 2 2, ,..., lP p p p l
1 2,..., l
0, 0i i ii i
p
1 2 2, ,..., lP p p p l
,C D P( ) ( ) ,CH C CH D C D P
Half-Spaces
• Lemma: let be a finite set. Let and is a half-space of containing . Then there is a point in contained in
• Corollary: Let Then, . Proof:– regular simplex–
lP ( )s CH P
dim ( ) 1VC S l
h l s
P h
( , )lS X R closed halfspaces
dim ( ) 1VC S l
dim ( ) 2VC S l
Sauer’s Lemma
• If is a range space with and then
• Proof: By induction. The claim trivially holds for or .
• Let , and consider the sets:
•
( , )X R dim ( , )VC X R X n ( )R G n
0 0n
x X
\ \ , \ \xR r x r x R r x R R x r x r R
\xR R R x
Even More Definitions
• Let be a range space. We define its shatter function to be:
• We define the shattering dimension as the smallest such that
( , )S X R
( ) maxs B
B XB m
m R
d ( ) ( ),ds m O m m
• Let be the largest set shattered by and , so
Assuming
N X S
( log )O d d 2 ( ) d
sNR c N
lg( ) lg( )c d
lg( )
lg( )
cd
max(2,2 lg( ))c
26 2(6 ) ln(6 )
2 lg( ) ln ln 2
dd d d d
Why Do We Need Shattering Dimension?
• Let be a range space. The shattering dimension of is 3
• Proof: Consider any set of points, then : –
–
–
2( , )S X R all disks S
P n 34PR n
1Pr r R r n
2 2P
nr r R r
3 4 82 3P
n nr r R r
Mixing Range Spaces
• Let be range spaces with finite VC dimensions .
• Let be a function that maps k-tuples of sets , into a subset of X by
• Consider the range set• Theorem: The associated range space
has a VC dimension bounded by , where
1
( , )ki
i iS X R
1,..., k
1( ,..., )kf r ri
ir R , , \,
1' ( ,..., ) ik iR f r r r R
( lg )O k kmax i i
Proof
• Assume is being shattered by
•
• Assume
,Y X Y t
12 ' ( ,..., )
( ) ( ( )) 2( )i
t i ik iY Y Y
i
kk
i
R r r r R R
teG t G t
T
, lg( / ) 1t e te
(1 lg( / )) 3 lg( / )t k te k t
Proof
• Setting gives us:
3 lg( / )t k t
/x t
3 lg( )
36
ln ln 22 6 ln(6 )
12 ln(6 )
x k x
x kk
xx k k
t k k
Dual Range Spaces
• Let be a range space.
• Let , We define
• The dual range space is
( , )S X R
p X pR r p r R
( , ), pS R X X R p X
• Assume is shattered by , then there are points creating the following matrix:
12 1,...,r r R
S
2
1 22
1
2
0 1 1
: 0 0 1
0 0 1
p p p
r
M r
r
0 0 0
0 0 1
' : 0 1 0
1 1 1
M
• The set of points represented by the columns of M’ is shattered by
12 lg
0 0 0
0 0 1
' : 0 1 0
1 1 1
M
S
lg
1 lg 12
• Let be a range space, and let be a finite subset of points. A subset is an for if :
Samples ( , )S X R x
C x Sample x
: ( ) ( )r R m r s r
The Theorem• There is a positive constant such that if is
any range space with VC dimension at most , is a finite subset and then a random subset of cardinality
is an for with probability at least
c ( , )X R
Sample x X
, 0
2
1log log
cs
sample x 1
• Let be a range space, and let be a finite subset of points. A subset is an for if:
Nets ( , )S X R x
N x Net x
: ( )r R m r r N
The TheoremNet • Let be a range space of VC dimension ,
and let be a finite subset of points.• Suppose • Let be a set obtained by random independent
draws from , where:
• Then is an with probability at least
( , )S X R x
0 1, 1
N m
x4 4 8 16
max( lg , lg )m
N Net 1
Range Searching
• Consider a very large set of points in the plane.
• We would like to be able to quickly decide how many points are included inside a query rectangle.
• The Theorem tells us that there is a subset of constant size, that works for all query rectangles.
sample
Learning a Concept
• A function is defined over the plane and returns ‘1’ inside an unknown disk and ‘0’ outside of it.
• There is some distribution and we can pick (roughly) random points in a sample and compute just for them
f
unknownD
D
((1/ ) log(1/ ))O R f
Learning a Concept
• We compute the smallest that contains only the points in labeled ‘1’ and define that returns ‘1’ inside and ‘0’ outside it
• We claim that classifies correctly all but of the points:
'DD g
'D
unknownD
'D
g
fraction
Pr ( ) ( )p f p g p D
Learning a Concept
• Let be a range space where is all the symmetric differences between 2 disks.
• is our finite set•
• is the probability of mistake in classification
2( , )S R R
2D 'unknownr D D R
( )m r
Discrepancy
• Let be a coloring. We define the discrepancy of over a range as the amount of imbalance in the coloring:
• Let be a partition of into pairs. We will refer to as compatible with if a each point in a pair in is colored by a different color by
: { 1,1}X r
( ) ( )p r
r p
X
Discrepancy
• Let denote the crossing number of • Let be the contribution of the i’th crossing pair to
For , for some threshold we have by Chernoff’s inequality:
#r r
{ 1, 1}iX
( )r
2 # ln(4 )r rc m
2 1Pr ( ) 2Pr 2exp
2# 2r
r i r ci r
r Xm
0c
• Lemma: Let be an for , and let be an for . Then is an for
• Proof:
Q P
Building via DiscrepancySample
1 sample P'Q Q 2 sample Q 'Q
1 2 sample P
1 2
' '
' '
'
'
P r Q r P r Q r Q r Q r
P Q P Q Q Q
P r Q r Q r Q r
P Q Q Q
• Let be a range space with shattering dimension
• Let be a set of points, consider the induced range space
• • Consider coloring , with the discrepancy
bounded as we have seen ant let
( , )S X R
d
P X
( )dPm R O n
Building via DiscrepancySample
n
( , )P PS P R
{ ( ) 1},
2
Q p P p
nQ
• Now, for every range :
, for some absolute constant
r R
Building via DiscrepancySample
( \ ) ( ) ln(4 ) ln ( )
ln( )
d
d
P Q r Q r r n m n O n
c n n
c
2 ( \ ) ln( )dP r Q r P Q r Q r c n n
ln( )( ), ( )
P r Q r d nn n c
P Q n
• Let . We will compute coloring of with low discrepancy. Let be the points of colored -1 by
• Let
• By the lemma we have that is a for
0 1,P P P Q
Building via DiscrepancySample
1i
1iP iP
1iP 1i
1 1( ) ( )
2i i i
nn
kP 1
k
iisample
P
• We look for the maximal such that :
• So, taking the largest results in a set of the size which is an for
Building via DiscrepancySample
k1
k
ii
1 11
1 11 111 1
ln( )ln( / 2 ) ln( / 2 )
/ 2 / 2
i kkk k
i i kii k
d nd n d nc c c
n n n
2 2 2
11 1 112 2 2
1
ln( )2 lnk
kk
nc d c d c dn
n
k kP2(( / ) ln( / ))O d d sample P
• Let be a range space of VC dimension and let be a set of points of size
• Let be a set of points obtained by independent samples from
• Let • We wish to bound • Let be a sample generated like
Proof of the TheoremNet
( , )S X R x n
1( ,..., )mN x xx
1 r R r x n r N
1Pr
1( ,..., )mT y y N
• Let
• Assume occurs, and • Then
Proof of the TheoremNet
2 2
mr R r x n r N r T
1 2Pr 2Pr :
2 1 2 1 1 2 1Pr Pr / Pr Pr / Pr
2 1Pr 1/ 2 : 1 1'r
2 1 2Pr Pr ' Pr '2
mr r T
• Let• Then is a binomial variable with:
• Thus, by Chebychev’s inequality:
Proof of the TheoremNet ' /p r x n
' 'X r T
' , ' (1 )X pm V X p p m pm
2
Pr ' Pr ' Pr '2 2 2
Pr ' Pr ' ' '2 2
2 1, 8 / 8 /
2
m pm pmX X X pm
pm pmX pm pm X X V X
m ppm
• Let
Proof of the TheoremNet '2 2
mr R r N r T
'1 2 2Pr 2Pr 2Pr :
'2 2
' /22Pr (2 )2 :mG m
2 2
2
'2' '
2 2
' '2 2
Pr ( )Pr Pr ( ) Pr
Pr
Pr ( ) Pr Pr ( )
m m
m
z x z x
z x
Z zZ z Z z
Z z
Z z Z z Z z
• Now we fix a set ,and it’s enough to consider the range space , we know
• Let us fix any and consider the event
Proof of the TheoremNet
Z
( , )ZZ R (2 )ZR G m
Zr R
2r
mr N r T
• If it is trivial• Otherwise,
Proof of the TheoremNet /2Pr 2 :m
r
( ) / 2k r N T m
/2
2 2Pr Pr /
(2 )(2 1)...( 1) ( 1)...( 1)
2 (2 1)...( 1) 2 (2 1)...(2 1)
2 2
r
k m
m k mr N
m m
m k m k m k m m m k
m m m m m m k