<< Chapter < Page Chapter >> Page >

Take

Z = | R n ( f ) ^ - R ( f ) | and t = ϵ
P ( | R n ^ ( f ) - R ( f ) | ϵ ) E [ | R n ( f ) ^ - R ( f ) | 2 ] ϵ 2 var ( R ^ n ( f ) ) ϵ 2 = i = 1 n var ( L i n ) ϵ 2 = var ( ( X ) , Y ) n ϵ 2 = σ L 2 n ϵ 2 .

So, the probability goes to zero at a rate of at least n - 1 . However, it turns out that this is an extremely loose bound. Accordingto the Central Limit Theorem

R n ^ ( f ) = 1 n i = 1 n L i N R ( f ) , σ L 2 n as n

in distribution. This suggests that for large values of n,

P ( | R n ^ ( f ) - R ( f ) | ϵ ) O e - n ϵ 2 2 σ L 2 .

That is, the Gaussian tail probability is tending to zero exponentially fast.

Chernoff's bound

Note that for any nonnegative random variable Z and t > 0 ,

P ( Z t ) = P ( e s Z e s t ) E [ e s Z ] e s t , s > 0 by Markov's inequality .

Chernoff's bound is based on finding the value of s that minimizes the upper bound. If Z is a sum of independent random variables. For example, say

Z = i = 1 n ( f ( X i ) , Y i ) - R ( f ) = n R ^ n ( f ) - R ( f )

then the bound becomes

P i = 1 n ( L i - E [ L i ] ) t e - s t E [ e s i = 1 n ( L i - E [ L i ] ) ] e - s t i = 1 n E [ e s ( L i - E [ L i ] ) ] , from independence.

Thus, the problem of finding a tight bound boils down to finding a good bound for E [ s s ( L i - E [ L i ] ) ] . Chernoff ('52), first studied this situation for binary random variables. Then,Hoeffding ('63) derived a more general result for arbitrary bounded random variables.

Hoeffding's indequality

Theorem

Hoeffding's inequality

Let Z 1 , Z 2 , . . . , Z n be independent bounded random variables such that Z i [ a i , b i ] with probability 1. Let S n = i = 1 n Z i . Then for any t > 0 , we have

P ( | S n - E [ S n ] | t ) 2 e - 2 t 2 i = 1 n ( b i - a i ) 2 .

The key to proving Hoeffding's inequality is the following upper bound: if Z is a random variable with E [ Z ] = 0 and a Z b , then

E [ e s Z ] e s 2 ( b - a ) 2 8 .

This upper bound is derived as follows. By the convexity of theexponential function,

e s z z - a b - a e s b + b - z b - a e s a , for a z b .
Convexity of exponential function.

Thus,

E [ e s Z ] E Z - a b - a e s b + E b - Z b - a e s a = b b - a e s a - a b - a e s b , since E [ Z ] = 0 = ( 1 - θ + θ e s ( b - a ) ) e - θ s ( b - a ) , where θ = - a b - a .

Now let

u = s ( b - a ) and define φ ( u ) - θ u + log ( 1 - θ + θ e u ) .

Then we have

E [ e s Z ] ( 1 - θ + θ e s ( b - a ) ) e - θ s ( b - a ) = e φ ( u ) .

To minimize the upper bound let's express φ ( u ) in a Taylor's series with remainder :

φ ( u ) = φ ( 0 ) + u φ ' ( 0 ) + u 2 2 φ ' ' ( v ) for some v [ 0 , u ]
φ ' ( u ) = - θ + θ e u 1 - θ + θ e u φ ' ( u ) = 0 φ ' ' ( u ) = θ e u 1 - θ + θ e u - ( θ e u ) 2 ( 1 - θ + θ e u ) 2 = θ e u 1 - θ + θ e u ( 1 - θ e u 1 - θ + θ e u ) = ρ ( 1 - ρ ) .

Now, φ ' ' ( u ) is maximized by

ρ = θ e u 1 - θ + θ e u = 1 2 φ ' ' ( u ) 1 4 .

So,

φ ( u ) u 2 8 = s 2 ( b - a ) 2 8
E [ e s Z ] e s 2 ( b - a ) 2 8 .

Now, we can apply this upper bound to derive Hoeffding's inequality.

P ( S n - E [ S n ] t ) e - s t i = 1 n E [ e s ( L i - E [ L i ] ) ] e - s t i = 1 n e s 2 ( b i - a i ) 2 8 = e - s t e s 2 i = 1 n ( b i - a i ) 2 8 = e - 2 t 2 i = 1 n ( b i - a i ) 2 by choosing s = 4 t i = 1 n ( b i - a i ) 2

Similarly, P ( E [ S n ] - S n t ) e - 2 t 2 i = 1 n ( b i - a i ) 2 . This completes the proof of the Hoeffding's theorem.

Application

Let Z i = 1 f ( X i ) Y i - R ( f ) , as in the classification problem. Then for a fixed f, it follows fromHoeffding's inequality (i.e., Chernoff's bound in this special case) that

P ( | R n ^ ( f ) - R ( f ) | ϵ ) = P 1 n | S n - E [ S n ] | ϵ = P ( | S n - E [ S n ] | n ϵ ) 2 e - 2 ( n ϵ ) 2 n = 2 e - 2 n ϵ 2 .

Now, we want a bound like this to hold uniformly for all f F . Assume that F is a finite collection of models and let | F | denote its cardinality. We would like to bound the probability that max f F | R n ^ ( f ) - R ( f ) | ϵ . Note that the event

max f F | R n ^ ( f ) - R ( f ) | ϵ f F | R n ^ ( f ) - R ( f ) | ϵ .

Therefore

P max f F | R n ^ ( f ) - R ( f ) | ϵ = P f F | R n ^ ( f ) - R ( f ) | ϵ f F P ( | R n ^ ( f ) - R ( f ) | ϵ ) , the `` union of events '' bound 2 | F | e - 2 n ϵ 2 , by Hoeffding's inequality.

Thus, we have shown that with probability at least 1 - 2 | F | e - 2 n ϵ 2 , f F

| R n ^ ( f ) - R ( f ) | < ϵ .

And accordingly, we can be reasonably confident in selecting f from F based on the empirical risk function R ^ n .

Questions & Answers

what are components of cells
ofosola Reply
twugzfisfjxxkvdsifgfuy7 it
Sami
58214993
Sami
what is a salt
John
the difference between male and female reproduction
John
what is computed
IBRAHIM Reply
what is biology
IBRAHIM
what is the full meaning of biology
IBRAHIM
what is biology
Jeneba
what is cell
Kuot
425844168
Sami
what is cytoplasm
Emmanuel Reply
structure of an animal cell
Arrey Reply
what happens when the eustachian tube is blocked
Puseletso Reply
what's atoms
Achol Reply
discuss how the following factors such as predation risk, competition and habitat structure influence animal's foraging behavior in essay form
Burnet Reply
cell?
Kuot
location of cervical vertebra
KENNEDY Reply
What are acid
Sheriff Reply
define biology infour way
Happiness Reply
What are types of cell
Nansoh Reply
how can I get this book
Gatyin Reply
what is lump
Chineye Reply
what is cell
Maluak Reply
what is biology
Maluak
what is vertibrate
Jeneba
what's cornea?
Majak Reply
what are cell
Achol
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical learning theory. OpenStax CNX. Apr 10, 2009 Download for free at http://cnx.org/content/col10532/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?

Ask