<< Chapter < Page Chapter >> Page >
i log p ( x ( i ) ; θ ) = i log z ( i ) p ( x ( i ) , z ( i ) ; θ )
= i log z ( i ) Q i ( z ( i ) ) p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) )
i z ( i ) Q i ( z ( i ) ) log p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) )

The last step of this derivation used Jensen's inequality. Specifically, f ( x ) = log x is a concave function, since f ' ' ( x ) = - 1 / x 2 < 0 over its domain x R + . Also, the term

z ( i ) Q i ( z ( i ) ) p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) )

in the summation is just an expectation of the quantity p ( x ( i ) , z ( i ) ; θ ) / Q i ( z ( i ) ) with respect to z ( i ) drawn according to the distribution given by Q i . By Jensen's inequality, we have

f E z ( i ) Q i p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) E z ( i ) Q i f p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) ,

where the “ z ( i ) Q i ” subscripts above indicate that the expectations are with respect to z ( i ) drawn from Q i . This allowed us to go from  [link] to [link] .

Now, for any set of distributions Q i , the formula  [link] gives a lower-bound on ( θ ) . There're many possible choices for the Q i 's. Which should we choose? Well, if we have some current guess θ of the parameters, it seems natural to try to make the lower-bound tight at that value of θ . I.e., we'll make the inequality above hold with equality at ourparticular value of θ . (We'll see later how this enables us to prove that ( θ ) increases monotonically with successsive iterations of EM.)

To make the bound tight for a particular value of θ , we need for the step involving Jensen's inequality inour derivation above to hold with equality. For this to be true, we know it is sufficient that that the expectationbe taken over a “constant”-valued random variable. I.e., we require that

p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) = c

for some constant c that does not depend on z ( i ) . This is easily accomplished by choosing

Q i ( z ( i ) ) p ( x ( i ) , z ( i ) ; θ ) .

Actually, since we know z Q i ( z ( i ) ) = 1 (because it is a distribution), this further tells us that

Q i ( z ( i ) ) = p ( x ( i ) , z ( i ) ; θ ) z p ( x ( i ) , z ; θ ) = p ( x ( i ) , z ( i ) ; θ ) p ( x ( i ) ; θ ) = p ( z ( i ) | x ( i ) ; θ )

Thus, we simply set the Q i 's to be the posterior distribution of the z ( i ) 's given x ( i ) and the setting of the parameters θ .

Now, for this choice of the Q i 's, Equation  [link] gives a lower-bound on the loglikelihood that we're trying to maximize. This is the E-step. In the M-step of the algorithm, we then maximizeour formula in Equation  [link] with respect to the parameters to obtain a new setting of the θ 's. Repeatedly carrying out these two steps gives us the EM algorithm, which is as follows:

  • Repeat until convergence {
    • (E-step) For each i , set
      Q i ( z ( i ) ) : = p ( z ( i ) | x ( i ) ; θ ) .
    • (M-step) Set
      θ : = arg max θ i z ( i ) Q i ( z ( i ) ) log p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) .
  • }

How will we know if this algorithm will converge? Well, suppose θ ( t ) and θ ( t + 1 ) are the parameters from two successive iterations of EM. We will now prove that ( θ ( t ) ) ( θ ( t + 1 ) ) , which shows EM always monotonically improves the log-likelihood.The key to showing this result lies in our choice of the Q i 's. Specifically, on the iteration of EM in which the parametershad started out as θ ( t ) , we would have chosen Q i ( t ) ( z ( i ) ) : = p ( z ( i ) | x ( i ) ; θ ( t ) ) . We saw earlier that this choice ensures that Jensen's inequality,as applied to get  [link] , holds with equality, and hence

( θ ( t ) ) = i z ( i ) Q i ( t ) ( z ( i ) ) log p ( x ( i ) , z ( i ) ; θ ( t ) ) Q i ( t ) ( z ( i ) ) .

The parameters θ ( t + 1 ) are then obtained by maximizing the right hand side of the equation above. Thus,

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask