<< Chapter < Page | Chapter >> Page > |
So this is the class of all linear classifiers and so I’m going to define, or maybe redefine empirical risk minimization as instead of writing this choosing a set of parameters, I want to think of it as choosing a function into hypothesis class of script H that minimizes – that minimizes my training error. Okay? So – actually can you raise your hand if it makes sense to you why this is equivalent to the previous formulation? Okay, cool. Thanks. So for development of the use of think of algorithms as choosing from function from the class instead, because in a more general case this set, script H, can be some other class of functions. Maybe is a class of all functions represented by viewer network, or the class of all – some other class of functions the learning algorithm wants to choose from. And this definition for empirical risk minimization will still apply. Okay? So what we’d like to do is understand whether empirical risk minimization is a reasonable algorithm. Alex?
Student: [Inaudible] a function that’s defined by G of data TX, or is it now more general?
Instructor (Andrew Ng): I see, right, so lets see – I guess this – the question is H data still defined by G of phase transpose X, is this more general? So –
Student: [Inaudible]
Instructor (Andrew Ng): Oh, yeah so very – two answers to that. One is, this framework is general, so for the purpose of this lecture it may be useful to you to keep in mind a model of the example of when H subscript data is the class of all linear classifiers such as those used by like a visectron algorithm or logistic regression. This – everything on this board, however, is actually more general. H can be any set of functions, mapping from the INFA domain to the center of class label zero and one, and then you can perform empirical risk minimization over any hypothesis class.
For the purpose of today’s lecture, I am going to restrict myself to talking about binary classification, but it turns out everything I say generalizes to regression in other problem as well. Does that answer your question?
Student: Yes.
Instructor (Andrew Ng): Cool. All right. So I wanna understand if empirical risk minimization is a reasonable algorithm. In particular, what are the things we can prove about it? So clearly we don’t actually care about training error, we don’t really care about making accurate predictions on the training set, or at a least that’s not the ultimate goal. The ultimate goal is how well it makes – generalization – how well it makes predictions on examples that we haven’t seen before. How well it predicts prices or sale or no sale outcomes of houses you haven’t seen before.
So what we really care about is generalization error, which I write as epsilon of H. And this is defined as the probability that if I sample a new example, X comma Y, from that distribution scripts D, my hypothesis mislabels that example. And in terms of notational convention, usually if I use – if I place a hat on top of something, it usually means – not always – but it usually means that it is an attempt to estimate something about the hat. So for example, epsilon hat here – this is something that we’re trying – think of epsilon hat training error as an attempt to approximate generalization error. Okay, so the notation convention is usually the things with the hats on top are things we’re using to estimate other quantities. And H hat is a hypothesis output by learning algorithm to try to estimate what the functions from H to Y – X to Y. So let’s actually prove some things about when empirical risk minimization will do well in a sense of giving us low generalization error, which is what we really care about. In order to prove our first learning theory result, I’m going to have to state two lemmas, the first is the union vowel, which is the following, let A1 through AK be K event. And when I say events, I mean events in a sense of a probabilistic event that either happens or not. And these are not necessarily independent.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?