4.5 Stein's lemma (Page 2/2)

Statistical signal processing Page 2 / 2

In the case that the $L$ observations are statistically independent according to both models, Stein's Lemma states that the false-alarm probabilityresulting from using the Neyman-Pearson hypothesis test has the form (Johnson and Orsak)

L

∞ 1 L P F p r | ℳ 1 p r | ℳ 0

P_{F} \overset{L \to \infty}{\to} f L L p_{r | ℳ_{1}} p_{r | ℳ_{0}}

where

f L

is a slowly varying function compared to the exponential:

L

∞ f L L 0 . This function is problem- and

P_{M}

-dependent, and thus usually not known. What this asymptotic formula says is that as the numberof observations increases, the false-alarm probability of a Neyman-Pearson hypothesis test plotted on semilogarithmiccoordinates will eventually become a straight line, the slope of which is

p_{1} p_{0}

. demonstrates this effect.

Using the Gaussian and Poisson classification problems as examples, we plot the false-alarm probability(left panels) and average probability of error (right panels) for each as a function of the amount ofstatistically independent data used by the optimal classifier. The miss-probability criterion was that it beless than or equal to 0.1. The a priori probabilities are 1/2 in the right-column examples. As shown here, the average error probability produced by theminimum

P_{e}

classifier typically decays more slowly than the false-alarm probability for the classifier that fixes themiss probability. The dashed lines depict the behavior of the error probabilities as predicted by asymptotic theory . In each case, these theoretical lines have been shifted vertically for ease ofcomparison.

A similar result holds for the miss probability if it is optimized and the false-alarm probability is constrained. Thus,in all model evaluation problems solved using the Neyman-Pearson approach, the optimized probabilitywill always (eventually) decay exponentially in the number of observations. For this reason,

p_{r | ℳ_{1}} p_{r | ℳ_{0}}

is known as the exponential rate .

Showing this result relies on, believe it or not, the law of large numbers. Define the decision region $ℜ_{1} L$ according to $ℜ_{1} L r L p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ p_{r | ℳ_{1}} p_{r | ℳ_{0}} L p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ$ This decision region will vary with the number of observations as is typical of Neyman-Pearson decision rules.First of all, we must show that the decision rule this region yields does indeed satisfy a criterion on the miss probability. $1 P_{M} ℳ_{1} 1 L l l p r_{l} | ℳ_{1} p r_{l} | ℳ_{0} p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ$ This sum is the average of the likelihood ratios computed at each observation assuming that $ℳ_{1}$ is true. Because of the strong law of large numbers, as the number of observations increases, this sum converges toits expected value, which is $p_{r | ℳ_{1}} p_{r | ℳ_{0}}$ . Therefore, $L$ ∞ 1 P M 1 for any $δ$ , which means $P_{M} 0$ . Thus, this decision region guarantees not only that the miss probability is less than some specified number, butalso that it decreases systematically as the number ofobservations increases.

To analyze how the false-alarm probability varies with $L$ , we note that in this decision region $p_{r | ℳ_{0}} p_{r | ℳ_{1}} L p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ$ $p_{r | ℳ_{0}} p_{r | ℳ_{1}} L p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ$ Integrating these over $ℜ_{1} L$ yields upper and lower bounds on $P_{F}$ . $L p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ 1 P_{M} P_{F} L p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ 1 P_{M}$ or $p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ 1 L 1 P_{M} 1 L P_{F} p_{r | ℳ_{1}} p_{r | ℳ_{0}} δ 1 L 1 P_{M}$ Because $δ$ can be made arbitrarily small as the number of observations increases, we obtain Stein's Lemma.

For the average probability of error, the Chernoff distance is its exponential rate. $P_{e} \overset{L \to \infty}{\to} f L L 𝒞 p_{r | ℳ_{1}} p_{r | ℳ_{0}}$ Showing this result leads to more interesting results. For the minimum-probability-of-error detector, the decisionregion $ℜ_{1}$ is defined according to $π_{1} p_{r | ℳ_{1}} π_{0} p_{r | ℳ_{0}}$ . Using this fact, we can write the expression for $P_{e}$ as

P_{e} π_{0} r ℜ_{1} p r ℳ_{0} r π_{1} r ℜ_{0} p r ℳ_{1} r r π_{0} p_{r | ℳ_{0}} π_{1} p_{r | ℳ_{1}}

The minimum function has the property

a b a 1 s b s

0 s 1

for non-negative quantities

a

and

b

To see this, simply plot the minimum function versus one of its argument with the other onefixed. Plotting

a 1 s b s

the same way shows that it indeed is an upper bound.

With this bound, we can find an upper bound for the average error probability.

P_{e} r π_{0} p r ℳ_{0} r 1 s π_{0} p r ℳ_{1} r 1 s r p r ℳ_{0} r 1 s p r ℳ_{1} r 1 s

When

r

has statistically independent and identically distributed components,

P_{e} r l l p r_{l} ℳ_{0} r_{l} 1 s l l p r_{l} ℳ_{1} r_{l} 1 s l l r_{l} p r_{l} ℳ_{0} r_{l} 1 s p r_{l} ℳ_{1} r_{l} 1 s r p r_{l} ℳ_{0} r_{l} 1 s p r_{l} ℳ_{1} r_{l} 1 s L

Consequently, we have that

s 1 L P_{e} r p r_{l} ℳ_{0} r_{l} 1 s p r_{l} ℳ_{1} r_{l} 1 s

Asymptotically, this bound is tight (Gaussian example shows this to be true). Thus, the exponential rate for the averageprobability of error is the minimum of this bound with respect to

s

. This quantity is the negative of the Chernoff distance and we arrive at the expression for the average probability of error.

The Kullback-Leibler and Chernoff distances are related in an important way. Define $p_{s}$ to be $p_{0} 1 s p_{1} s J s$ , where $J s$ is the normalization constant and the quantity optimized in the definition of the Chernoff distance. TheChernoff distance equals the Kullback-Leibler distance between the $p_{s}$ equi-distant from $p_{0}$ and $p_{1}$ . Call the value of $s$ corresponding to this equi-distant distribution $s^{*}$ ; this value corresponds to the result of the optimization and $𝒞 p_{0} p_{1} p_{s^{*}} p_{0} p_{s^{*}} p_{1}$ shows that the exponential rates of minimum $P_{e}$ and Neyman-Pearson hypothesis tests are not necessarily the same. In other words, the distance to theequi-distant distribution need not equal the total distance, which seems to make sense.

The larger the distance, the greater the rate at which the error probability decreases, which corresponds to an easier problem(smaller error probabilities). However, Stein's Lemma does not allow a precise calculation of error probabilities. Thequantity $f ·$ depends heavily on the underlying probability distributions in ways that are problem dependent. All adistance calculation gives us is the exponential rate.

For the Gaussian example we have been considering, set the miss probability's criterion value to be $α$ . Because $P_{M} Q L m γ L σ$ , we find the threshold value $γ$ to be $L m L σ Q α$ , which yields a false-alarm probability of $Q L m σ Q α$ The Kullback-Leibler distance between two Gaussians $p_{0} 0 σ 2$ and $p_{1} m σ 2$ equals

p_{1} p_{0} x 1 2 σ 2 x 2 2 σ 2 x 2 2 σ 2 x m 2 2 σ 2 x 1 2 σ 2 x 2 2 σ 2 -2 m x m 2 2 σ 2 m 2 2 σ 2

In this Gaussian case the Kullback-Leibler distance is symmetric:

p_{1} p_{0} p_{0} p_{1}

. Verify this atypical (most cases are asymmetric) result for yourself. The Chernoff distance equals

m 2 8 σ 2

(the value of

s^{*}

1 2

). Stein's Lemma predicts that the false-alarm probability has the asymptotic form

f L L m 2 2 σ 2

. Thus, does

Q L m σ Q α f L L m 2 2 σ 2

, where

f L

is slowly varying compare to the exponential?

An asymptotic formula for $Q x$ is $Q x \overset{x \to \infty}{\to} 1 2 x x 2 2$ As $L$ increases, so does the argument of $Q ·$ in the expression for the false-alarm probability. Thus

P_{F} \overset{L \to \infty}{\to} 1 2 L m σ Q α L m σ Q α 2 2 1 2 L m σ Q α L m σ Q α Q α 2 2 L m 2 2 σ 2

The quantity multiplying the final exponential corresponds to

f L

, which satisfies the slowly varying property. We have verified Stein's Lemma prediction for the asymptoticbehavior of the false-alarm probability in the Gaussian case. Note how the criterion value

α

occurs only in the expression for

f L

, and hence does not affect the exponential rate; Stein's Lemma says this situation alwaysoccurs.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Statistical signal processing. OpenStax CNX. Dec 05, 2011 Download for free at http://cnx.org/content/col11382/1.1

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical signal processing' conversation and receive update notifications?

Ask

©flickr: iClassical	Music Apperciation By Courntey Hub Start Test
	1 Understanding Societies 1 By Jessica Collett Start Exam
	15 Dr. Amberg Pharm quiz By Brooke Delaney Start Exam
	Anthropology Culture Personality By Richley Crapo Start Assignment
	NCE Ch 09 Research and Program Evaluation By Anh Dao Start Quiz
	45 Biology 45 Population and Community Ecology MCQ By OpenStax Start Quiz
	Principles of microeconomics for ap® courses By OpenStax Read Online Course
	Anatomy & Physiology By OpenStax Read Online Course
	24 AP Key Terms 24 Metabolism Nutrition By OpenStax Start Key Terms
	3 Neuroscience Exam 2004 4 By David Corey Start Exam