<< Chapter < Page Chapter >> Page >

Student: What is the [inaudible]?

Instructor (Andrew Ng) :It is actually defined the same way as finite dimensional spaces. So you know, suppose you have infinite – actually, these are constantly infinite dimensional vectors; not [inaudible] to the infinite dimensional vectors. Normally, the 2 to 1 squared is equal to some [inaudible]equals 110 xi squared, so if x is infinite dimensional, you just appoint it like that. [Inaudible]. [Crosstalk]

Student: [Inaudible].

Instructor (Andrew Ng) :Now, say that again.

Student: [Inaudible].

Instructor (Andrew Ng) :Yes. Although, I assume that this is bounded by r.

Student: Oh.

Instructor (Andrew Ng) :It’s a – yeah – so this insures that conversions. So just something people sometimes wonder about. And last, the – actually – tie empirical risk minimization back a little more strongly to the source of algorithms we’ve talked about. It turns out that – so the theory was about, and so far, was really for empirical risk minimization. So that view’s – so we focus on just one training example. Let me draw a function, you know, a zero here jumps to one, and it looks like that. And so this for once, this training example, this may be indicator h where [inaudible] is d equals data transpose x; okay? But one training example – your training example will be positive or negative. And depending on what the value of this data transpose x is, you either get it right or wrong. And so you know, I guess if your training example – if you have a positive example, then when z is positive, you get it right.

Suppose you have a negative example, so y equals 0; right? Then if z, which is data transpose x – if this is positive, then you will get this example wrong; whereas, if z is negative then you’d get this example right. And so this is a part of indicator h subscript [inaudible] x not equals y; okay? You know, it’s equal to g of data transpose x; okay? And so it turns out that – so what you really like to do is choose parameters data so as to minimize this step function; right? You’d like to choose parameters data, so that you end up with a correct classification on setting your training example, and so you’d like indicator h of x not equal y. You’d like this indicator function to be 0. It turns out this step function is clearly a non-convex function.

And so it turns out that just the linear classifiers minimizing the training error is an empty heart problem. It turns out that both logistic regression, and support vector machines can be viewed as using a convex approximation for this problem. And in particular – and draw a function like that – it turns out that logistic regression is trying to maximize likelihood. And so it’s tying to minimize the minus of the logged likelihood. And if you plot the minus of the logged likelihood, it actually turns out it’ll be a function that looks like this. And this line that I just drew, you can think of it as a rough approximation to this step function; which is maybe what you’re really trying to minimize, so you want to minimize training error.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Machine learning. OpenStax CNX. Oct 14, 2013 Download for free at http://cnx.org/content/col11500/1.4
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Machine learning' conversation and receive update notifications?

Ask