linear (hyperplane) classifiers in RdRd
Consider dd = 2. Let nn be the number of training points, it is easy to see that when n=1n=1, let AA be as above.
By using linear classifiers in R2R2, it is easy to see that we can assign 1 to all possible subsets
{{x1},φ}{{x1},φ} and 0 to their complements. Hence SF(1)=2SF(1)=2.
When n=2n=2, we can also assign 1 to all possible subsets
{{x1,x2},{x1},{x2},φ}{{x1,x2},{x1},{x2},φ} and 0 to their complements, and vice versa. Hence SF(2)=4=22SF(2)=4=22.
When n=3n=3, we can arrange arrange the point x1,x2,x3x1,x2,x3(non-colinear) so that the set of linear classifiers shatters the three points, hence SF(3)=8=23SF(3)=8=23
When n=4n=4, no matter where the points x1,x2,x3,x4x1,x2,x3,x4 and what designated binary values y1,y2,y3,y4y1,y2,y3,y4 are. It's clear that AA does not shatter the four points. To see the claim, first observe that the four points will form a 4-gon (if the four points are co-linear, or if the three points are co-linear then clearly linear classifiers cannot shatter the points). The two points that belong to the same diagonal lines form 2 groups and no linear classifier can assign different values to the 2 groups. Hence SF(4)<16=24SF(4)<16=24 and VF=3VF=3.
We state here without proving it that in general the class of linear classifiers in RdRd has VF=d+1VF=d+1.
"A complete course in modern statistical learning theory at the graduate student level."