<< Chapter < Page | Chapter >> Page > |
Okay, so that's equal to – I'm going to expand now this quadratic function. So this is going to be, okay, and this is just sort of taking a quadratic function and expanding it out by multiplying the [inaudible]. And again, work through the steps later yourself if you're not quite sure how I did that.
So this thing, this vector, vector product, right, this quantity here, this is just J of theta and so it's just a real number, and the trace of a real number is just itself.
Student: [Inaudible].
Instructor (Andrew Ng) :Oh, thanks, Dan. Cool, great. So this quantity in parentheses, this is J of theta and it's just a real number. And so the trace of a real number is just the same real number. And so you can sort of take a trace operator without changing anything. And this is equal to one-half derivative with respect to theta of the trace of – by the second permutation property of trace. You can take this theta at the end and move it to the front.
So this is going to be trace of theta times theta transposed, X transpose X minus derivative with respect to theta of the trace of – I'm going to take that and bring it to the – oh, sorry. Actually, this thing here is also a real number and the transpose of a real number is just itself. Right, so take the transpose of a real number without changing anything.
So let me go ahead and just take the transpose of this. A real number transposed itself is just the same real number. So this is minus the trace of, taking the transpose of that. Here's Y transpose X theta, then minus [inaudible] theta. Okay, and this last quantity, Y transpose Y. It doesn't actually depend on theta. So when I take the derivative of this last term with respect to theta, it's just zero. So just drop that term.
And lastly, well, the derivative with respect to theta of the trace of theta, theta transposed, X transpose X. I'm going to use one of the facts I wrote down earlier without proof, and I'm going to let this be A. There's an identity matrix there, so this is A, B, A transpose C, and using a rule that I've written down previously that you'll find in lecture notes, because it's still on one of the boards that you had previously, this is just equal to X transpose X theta.
So this is C, A, B, which is sort of just the identity matrix, which you can ignore, plus X transpose X theta where this is now C transpose C, again times the identity which we're going to ignore, times B transposed. And the matrix X transpose X is the metric, so C transpose is equal to C.
Similarly, the derivative with respect to theta of the trace of Y transpose theta X, this is the derivative with respect to matrix A of the trace of B, A and this is just X transpose Y. This is just B transposed, again, by one of the rules that I wrote down earlier. And so if you plug this back in, we find, therefore, that the derivative – wow, this board's really bad.
So if you plug this back into our formula for the derivative of J, you find that the derivative with respect to theta of J of theta is equal to one-half X transpose theta, plus X transpose X theta, minus X transpose Y, minus X transpose Y, which is just X transpose X theta minus X [inaudible].
Okay, so we set this to zero and we get that, which is called a normal equation, and we can now solve this equation for theta in closed form. That's X transpose X theta, inverse times X transpose Y. And so this gives us a way for solving for the least square fit to the parameters in closed form, without needing to use an [inaudible] descent.
Okay, and using this matrix vector notation, I think, I don't know, I think we did this whole thing in about ten minutes, which we couldn't have if I was writing out reams of algebra. Okay, some of you look a little bit dazed, but this is our first learning hour. Aren't you excited? Any quick questions about this before we close for today?
Student: [Inaudible].
Instructor (Andrew Ng) :Say that again.
Student: What you derived, wasn't that just [inaudible] of X?
Instructor (Andrew Ng) :What inverse?
Student: Pseudo inverse.
Instructor (Andrew Ng) :Pseudo inverse?
Student: Pseudo inverse.
Instructor (Andrew Ng) :Yeah, I turns out that in cases, if X transpose X is not invertible, than you use the pseudo inverse minimized to solve this. But it turns out X transpose X is not invertible. That usually means your features were dependent. It usually means you did something like repeat the same feature twice in your training set. So if this is not invertible, it turns out the minimum is obtained by the pseudo inverses of the inverse.
If you don't know what I just said, don't worry about it. It usually won't be a problem. Anything else?
Student: On the second board [inaudible]?
Instructor (Andrew Ng) :Let me take that off. We're running over. Let's close for today and if they're other questions, I'll take them after.
[End of Audio]
Duration: 79 minutes
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?