<< Chapter < Page | Chapter >> Page > |
So here are some facts about the trace operator and about derivatives, and I'm just going to write these without proof. You can also have the TAs prove some of them in the discussion section, or you can actually go home and verify the proofs of all of these.
It turns out that given two matrices, A and B, the trace of the matrix A times B is equal to the trace of B, A. Okay, I'm not going to prove this, but you should be able to go home and prove this yourself without too much difficulty. And similarly, the trace of a product of three matrices, so if you can take the matrix at the end and cyclically permeate it to the front.
So trace of A times B, times C, is equal to the trace of C, A, B. So take the matrix C at the back and move it to the front, and this is also equal to the trace of B, C. Take the matrix B and move it to the front.
Okay, also, suppose you have a function F of A which is defined as a trace of A, B. Okay, so this is, right, the trace is a real number. So the trace of A, B is a function that takes this input of matrix A and output a real number. So then the derivative with respect to the matrix A of this function of trace A, B, is going to be B transposed. And this is just another fact that you can prove by yourself by going back and referring to the definitions of traces and matrix derivatives. I'm not going to prove it. You should work it out.
Okay, and lastly a couple of easy ones. The trace of A is equal to the trace of A transposed because the trace is just the sum of diagonal elements. And so if you transpose the matrix, the diagonal, then there's no change. And if lower case A is a real number, then the trace of a real number is just itself. So think of a real number as a one by one matrix. So the trace of a one by one matrix is just whatever that real number is.
And lastly, this is a somewhat tricky one. The derivative with respect to the matrix A of the trace of A, B, A, transpose C is equal to C, A, B plus C transposed A, B transposed. And I won't prove that either. This is sort of just algebra. Work it out yourself.
Okay, and so the key facts I'm going to use again about traces and matrix derivatives, I'll use five. Ten minutes. Okay, so armed with these things I'm going to figure out – let's try to come up with a quick derivation for how to minimize J of theta as a function of theta in closed form, and without needing to use an iterative algorithm.
So work this out. Let me define the matrix X. This is called the design matrix. To be a matrix containing all the inputs from my training set. So X 1 was the vector of inputs to the vector of features for my first training example. So I'm going to set X 1 to be the first row of this matrix X, set my second training example is in place to be the second variable, and so on.
And I have M training examples, and so that's going to be my design matrix X. Okay, this is defined as matrix capital X as follows, and so now, let me take this matrix X and multiply it by my parameter vector theta. This derivation will just take two or three sets. So X times theta – remember how matrix vector multiplication goes. You take this vector and you multiply it by each of the rows of the matrix.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?