<< Chapter < Page Chapter >> Page >
The likelihood ratio is derived as the optimal detector and simplifying it for easier computation is described.

Detection theory concerns making decisions from data. Decisions are based on presumptive models that may have produced the data.Making a decision involves inspecting the data and determining which model was most likely to have produced them. In this way, we are detecting which model was correct.Decision problems pervade signal processing. In digital communications, for example, determining if the current bit received in the presence of channel disturbances was a zero or a one is a detection problem.

More concretely, we denote by i the i th model that could have generated the data R . A "model" is captured by the conditional probability distribution of the data, which is denoted by the vector R . For example, model i is described by p R i r . Given all the models that can describe the data, we need to choose which model best matched what was observed.The word "best" is key here: what is the optimality criterion, and does the detection processing and the decision rule depend heavily on the criterion used?Surprisingly, the answer to the second question is "No." All of detection theory revolves around the likelihood ratio test , which as we shall see, emerges as the optimal detector under a wide variety of optimality criteria.

The likelihood ratio test

In a binary detection problem in which we have two models, four possible decision outcomes can result. Model 0 did in fact represent the best model for the data and the decision rule said it was (a correct decision) or saidit wasn't (an erroneous decision). The other two outcomes arise when model 1 was in fact true with either a correct or incorrect decision made. The decision process operates by segmentingthe range of observation values into two disjoint decision regions Z 0 and Z 1 . All values of R fall into either Z 0 or Z 1 . If a given R lies in Z 0 , we will announce our decision

"model 0 was true"
; if in Z 1 , model 1 would be proclaimed. To derive a rational method of deciding which model best describes the observations, we needa criterion to assess the quality of the decision process so that optimizing this criterion will specify the decision regions.

The Bayes' decision criterion seeks to minimize a cost function associated with making a decision. Let C i j be the cost of mistaking model j for model i ( i j ) and C i i the presumably smaller cost of correctly choosing model i : C i j C i i , i j . Let π i be the a priori probability of model i . The so-called Bayes' cost C is the average cost of making a decision.

C i j i j C i j say i when j true i j i j C i j π j j true say i
The Bayes' cost can be expressed as
C i j i j C i j π j j true R Z i i j i j C i j π j r Z i p R j r r Z 0 C 0 0 π 0 p R 0 r C 0 1 π 1 p R 1 r r Z 1 C 1 0 π 0 p R 0 r C 1 1 π 1 p R 1 r
To minimize this expression with respect to the decision regions Z 0 and Z 1 , ponder which integral would yield the smallest value if its integration domain included a specific value of theobservation vector. To minimize the sum of the two integrals, whichever integrand is smaller should include that value of r in its integration domain. We conclude that we choose 0 for those values of r yielding a smaller value for the first integral. π 0 C 0 0 p R 0 r π 1 C 0 1 p R 1 r π 0 C 1 0 p R 0 r π 1 C 1 1 p R 1 r We choose 1 when the inequality is reversed. This expression is easily manipulated to obtain the crowning result of detection theory:the likelihood ratio test .
p R 1 r p R 0 r 0 1 π 0 C 1 0 C 0 0 π 1 C 0 1 C 1 1
The comparison relation means selecting model 1 if the left-hand ratio exceeds the value on the right; otherwise, 0 is selected. The likelihood ratio p R 1 r p R 0 r , symbolically represented by Λ r , encapsulates the signal processing performed by the optimal detector on the observations r . The optimal decision rule then compares that scalar-valued result with a threshold η equaling π 0 C 1 0 C 0 0 π 1 C 0 1 C 1 1 . The likelihood ratio test can be succinctly expressed as thecomparison of the likelihood ratio with a threshold.
Λ r 0 1 η

The data processing operations are captured entirely by the likelihood ratio Λ r . However, the calculations required by the likelihood ratio can be simplified in many cases. Note that only the value of thelikelihood ratio relative to the threshold matters.Consequently, we can perform any positively monotonic transformation simultaneously on the likelihood ratio and the threshold without affecting the result of thecomparison. For example, we can multiply by a positive constant, add any constant or apply a monotonically increasing functionto reduce the complexity of the expressions. We single out one such function, the logarithm, because it often simplifies likelihoodratios that commonly occur in signal processing applications. Known as the log-likelihood , we explicitly express the likelihood ratio test with it as

Λ r 0 1 η
What simplifying transformations are useful are problem-dependent. But, bylaying bare what aspect of the observations is essential to the model-testing problem, we reveal the sufficient statistic ϒ r : the scalar quantity which best summarizes the data for detection purposes. The likelihood ratio test is best expressed in terms of thesufficient statistic.
ϒ r 0 1 γ
We denote the threshold value for the sufficient statistic by γ or by η when the likelihood ratio is used in the comparison.

The likelihood ratio is comprised of the quantities p R i r , which are known as likelihood functions and play an important role in estimation theory. It is the likelihood function that portrays the probabilistic modeldescribing data generation. The likelihood function completely characterizes the kind of "world" assumed by eachmodel. For each model, we must specify the likelihood function so that we can solve the hypothesis testing problem.

A complication, which arises in some cases, is that the sufficient statistic may not be monotonic. If it is monotonic, thedecision regions Z 0 and Z 1 are simply connected: all portions of a region canbe reached without crossing into the other region. If not, the regions are not simply connected and decision regionislands are created. Disconnected regions usually complicate calculations of decision performance. Monotonic ornot, the decision rule proceeds as described: the sufficient statistic is computed for each observation vector and comparedto a threshold.

The coach of a soccer team suspects his goalie has been less than attentive to his training regimen. The coach focuses on the kicks the goalie makes to send the ball down the field.The data r he observes is the length of a kick. The coach defines the models as

  • 0 : not maintaining a training regimen
  • 1 : is maintaining a training regimen
The conditional densities---models---of the kick length are shown in .
Conditional densities for the distribution of the lengths of soccer kicks assuming that the goalie has not attended to his training( 0 ) or did ( 1 ) are shown in the top row. The lower portion depicts the likelihood ratio formed from these densities.
Based on knowledge of soccer player behavior, the coach assigns a priori probabilities of π 0 1 4 and π 1 3 4 . The costs C i j are chosen to reflect the coach's sensitivity to the goalies feelings: C 0 1 1 C 1 0 (an erroneous decision either way is given the same cost) and C 0 0 0 C 1 1 . The likelihood ratio is plotted in and the threshold value η , which is computed from the a priori probabilities and the costs to be 1 3 , is indicated. The calculations of this comparison can be simplified in an obvious way. r 50 0 1 1 3 or r 0 1 50 3 16.7 The multiplication by the factor of 50 is a simple illustration of the reduction of the likelihood ratio to asufficient statistic. Based on the assigned costs and a priori probabilities, the optimum decision rule says the coach must assume that thestudent did not train if a kick is less than 16.7; if greater, the goalie is assumed to have traineddespite producing an abysmally short kick such as 20. Note that as the densities given by each model overlap entirely:the possibility of making the wrong interpretation always haunts the coach. However, no other procedure will be better (produce a smaller Bayes' cost)!

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Statistical signal processing. OpenStax CNX. Dec 05, 2011 Download for free at http://cnx.org/content/col11382/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical signal processing' conversation and receive update notifications?

Ask