<< Chapter < Page | Chapter >> Page > |
If an inequality of this form holds, then we say that is a Probability Approximately Correct (PAC) decision rule [link] .
To show that the empirical risk minimizer is a PAC decision rule, we first must understand how closely the empirical risk matches the truerisk. First, let us consider the empirical and true risk of the decision rule . Assume that the loss function is bounded between 0 and 1 (possibly after a suitable normalization). Then theempirical risk function is a sum of independent random variables bounded between 0 and 1. Hoeffding's inequality is a bound on thedeviations of such random sums from their corresponding mean values [link] . In this case, the mean value is the true risk of , and Hoeffding's inequality states that
Another equivalent statement is that the inequality holds with probability at least . Thus, the two risks are probably close together, and the greater the number of training examples, , the closer they are.
Now we would like a similar condition to hold for all , since ERM optimizes over the entire collection . Suppose that is a finite collection of decision rules. Let denote the number of rules in . The probability that the difference between the true and empirical risks, ofone or more of the decision rules, exceeds is bounded by the sum of the probabilities of each individual event of the form , the so-called Union of Events bound. Therefore, with probability at least we have that
for all . Equivalently, setting , we have that with probability at least and for all
Notice that the two risks are uniformly close together, and the closeness indicated by the boundincreases as increases and decreases as the number of decision rules in increases. In fact, the bound scales with , and so it is reasonable to interpret the logarithm of the number of decision rules under consideration as a measure ofthe complexity of the class.
Now using this bound, we can show that is a PAC decision rule as follows. Note that with probability at least
where the first inequality follows since the true and empirical risks are close for all , and in particular for , the second inequality holds since by definition minimizes the empirical risk, and the third inequality holds again since the empirical risk is close to the true risk for all , in this case for in particular. So, we have shown that is PAC.
PAC bounds of this form can be extended in many directions, for example to infinitely large or uncountable classes of decision rules,but the basic ingredients of the theory are essentially like those demonstrated above. The bottom line is that empirical riskminimization is a reasonable approach, provided one has access to a sufficient number of training examples and the number, or more generally the complexity, of the class of decision rules underconsideration is not too great.
Excellent treatments of classical decision and estimation theory can be found in a number of textbooks [link] , [link] , [link] , [link] . For references on statistical learning theory, outstanding textbooks are also available [link] , [link] , [link] for further reading.
Notification Switch
Would you like to follow the 'Statistical learning theory' conversation and receive update notifications?