<< Chapter < Page Chapter >> Page >

Oh, and as usual, this is really – you update all the parameters data runs. You perform this update for all values of I. For I indexes and the parameter vectors, you just perform this update, all of your parameters simultaneously. And the advantage of this algorithm is that in order to start learning, in order to start modifying the parameters, you only need to look at your first training examples.

You should look at your first training example and perform an update using the derivative of the error with respect to just your first training example, and then you look at your second training example and perform another update. And you sort of keep adapting your parameters much more quickly without needing to scan over your entire U.S. Census database before you can even start adapting parameters.

So let's see, for launch data sets, so constantly gradient descent is often much faster, and what happens is that constant gradient descent is that it won't actually converge to the global minimum exactly, but if these are the contours of your function, then after you run the constant gradient descent, you sort of tend to wander around.

And you may actually end up going uphill occasionally, but your parameters will sort of tender to wander to the region closest to the global minimum, but sort of keep wandering around a little bit near the region of the global [inaudible]. And often that's just fine to have a parameter that wanders around a little bit the global minimum. And in practice, this often works much faster than back gradient descent, especially if you have a large training set.

Okay, I'm going to clean a couple of boards. While I do that, why don't you take a look at the equations, and after I'm done cleaning the boards, I'll ask what questions you have.

Okay, so what questions do you have about all of this?

Student: [Inaudible] is it true – are you just sort of rearranging the order that you do the computation? So do you just use the first training example and update all of the theta Is and then step, and then update with the second training example, and update all the theta Is, and then step? And is that why you get sort of this really – ?

Instructor (Andrew Ng) :Let's see, right. So I'm going to look at my first training example and then I'm going to take a step, and then I'm going to perform the second gradient descent updates using my new parameter vector that has already been modified using my first training example. And then I keep going.

Make sense? Yeah?

Student: So in each update of all the theta Is, you're only using –

Instructor (Andrew Ng) :One training example.

Student: One training example.

Student: [Inaudible]?

Instructor (Andrew Ng) :Let's see, it's definitely a [inaudible]. I believe this theory that sort of supports that as well. Yeah, the theory that supports that, the [inaudible]of theorem is, I don't remember.

Okay, cool. So in what I've done so far, I've talked about an iterative algorithm for performing this minimization in terms of J of theta. And it turns out that there's another way for this specific problem of least squares regression, of ordinary least squares. It turns out there's another way to perform this minimization of J of theta that allows you to solve for the parameters theta in close form, without needing to run an iterative algorithm.

Questions & Answers

explain the basic method of power of power rule under indices.
Sumo Reply
Why is b in the answer
Dahsolar Reply
how do you work it out?
Brad Reply
answer
Ernest
heheheehe
Nitin
(Pcos∅+qsin∅)/(pcos∅-psin∅)
John Reply
how to do that?
Rosemary Reply
what is it about?
Amoah
how to answer the activity
Chabelita Reply
how to solve the activity
Chabelita
solve for X,,4^X-6(2^)-16=0
Alieu Reply
x4xminus 2
Lominate
sobhan Singh jina uniwarcity tignomatry ka long answers tile questions
harish Reply
t he silly nut company makes two mixtures of nuts: mixture a and mixture b. a pound of mixture a contains 12 oz of peanuts, 3 oz of almonds and 1 oz of cashews and sells for $4. a pound of mixture b contains 12 oz of peanuts, 2 oz of almonds and 2 oz of cashews and sells for $5. the company has 1080
ZAHRO Reply
If  , , are the roots of the equation 3 2 0, x px qx r     Find the value of 1  .
Swetha Reply
Parts of a pole were painted red, blue and yellow. 3/5 of the pole was red and 7/8 was painted blue. What part was painted yellow?
Patrick Reply
Parts of the pole was painted red, blue and yellow. 3 /5 of the pole was red and 7 /8 was painted blue. What part was painted yellow?
Patrick
how I can simplify algebraic expressions
Katleho Reply
Lairene and Mae are joking that their combined ages equal Sam’s age. If Lairene is twice Mae’s age and Sam is 69 yrs old, what are Lairene’s and Mae’s ages?
Mary Reply
23yrs
Yeboah
lairenea's age is 23yrs
ACKA
hy
Katleho
Ello everyone
Katleho
Laurene is 46 yrs and Mae is 23 is
Solomon
hey people
christopher
age does not matter
christopher
solve for X, 4^x-6(2*)-16=0
Alieu
prove`x^3-3x-2cosA=0 (-π<A<=π
Mayank Reply
create a lesson plan about this lesson
Rose Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask