<< Chapter < Page Chapter >> Page >

So X times theta is just going to be X 1 transposed theta, dot, dot, dot, down to X M, transposed theta. And this is, of course, just the predictions of your hypothesis on each of your M training examples. Then we also defined the Y vector to be the vector of all the target values Y1 through YM in my training set. Okay, so Y vector is an M dimensional vector.

So X theta minus Y contained the math from the previous board, this is going to be, right, and now, X theta minus Y, this is a vector. This is an M dimensional vector in M training examples, and so I'm actually going to take this vector and take this inner product with itself.

Okay, so we call that if Z is a vector than Z transpose Z is just sum over I, ZI squared. Right, that's how you take the inner product of a vector with a sum. So you want to take this vector, X theta minus Y, and take the inner product of this vector with itself, and so that gives me sum from I equals one to M, H, F, X, I, minus Y squared. Okay, since it's just the sum of the squares of the elements of this vector.

And put a half there for the emphasis. This is our previous definition of J of theta. Okay, yeah?

Student: [Inaudible]?

Instructor (Andrew Ng) :Yeah, I threw a lot of notations at you today. So M is the number of training examples and the number of training examples runs from one through M, and then is the feature vector that runs from zero through N. Does that make sense?

So this is the sum from one through M. It's sort of theta transpose X that's equal to sum from J equals zero through N of theta J, X, J. Does that make sense? It's the feature vectors that index from zero through N where X, zero is equal to one, whereas the training example is actually indexed from one through M.

So let me clean a few more boards and take another look at this, make sure it all makes sense. Okay, yeah?

Student: [Inaudible] the Y inside the parentheses, shouldn't that be [inaudible]?

Instructor (Andrew Ng) :Oh, yes, thank you. Oh is that what you meant? Yes, thank you. Great, I training example. Anything else? Cool. So we're actually nearly done with this derivation. We would like to minimize J of theta with respect to theta and we've written J of theta fairly compactly using this matrix vector notation.

So in order to minimize J of theta with respect to theta, what we're going to do is take the derivative with respect to theta of J of theta, and set this to zero, and solve for theta. Okay, so we have derivative with respect to theta of that is equal to – I should mention there will be some steps here that I'm just going to do fairly quickly without proof.

So is it really true that the derivative of half of that is half of the derivative, and I already exchanged the derivative and the one-half. In terms of the answers, yes, but later on you should go home and look through the lecture notes and make sure you understand and believe why every step is correct. I'm going to do things relatively quickly here and you can work through every step yourself more slowly by referring to the lecture notes.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask