<< Chapter < Page | Chapter >> Page > |
So in supervised learning, this is what we're going to do. We're given a training set, and we're going to feed our training set, comprising our M training example, so 47 training examples, into a learning algorithm. Okay, and our algorithm then has output function that is by tradition, and for historical reasons, is usually denoted lower case alphabet H, and is called a hypothesis. Don't worry too much about whether the term hypothesis has a deep meaning. It's more a term that's used for historical reasons.
And the hypothesis's job is to take this input. There's some new [inaudible]. What the hypothesis does is it takes this input, a new living area in square feet saying and output estimates the price of this house. So the hypothesis H maps from inputs X to outputs Y. So in order to design a learning algorithm, the first thing we have to decide is how we want to represent the hypothesis, right.
And just for this purposes of this lecture, for the purposes of our first learning algorithm, I'm going to use a linear representation for the hypothesis. So I'm going to represent my hypothesis as H of X equals theta zero, plus theta 1X, where X here is an input feature, and so that's the size of the house we're considering.
And more generally, come back to this, more generally for many regression problems we may have more than one input feature. So for example, if instead of just knowing the size of the houses, if we know also the number of bedrooms in these houses, let's say, then, so if our training set also has a second feature, the number of bedrooms in the house, then you may, let's say X1 denote the size and square feet. Let X have script two denote the number of bedrooms, and then I would write the hypothesis, H of X, as theta rho plus theta 1X1 plus theta 2X2.
Okay, and sometimes when I went to take the hypothesis H, and when I went to make this dependent on the theta is explicit, I'll sometimes write this as H subscript theta of X. And so this is the price that my hypothesis predicts a house with features X costs. So given the new house with features X, a certain size and a certain number of bedrooms, this is going to be the price that my hypothesis predicts this house is going to cost.
One final piece of notation, so for conciseness, just to write this a bit more compactly I'm going to take the convention of defining X0 to be equal to one, and so I can now write H of X to be equal to sum from I equals one to two of theta I, oh sorry, zero to two, theta I, X I. And if you think of theta as an X, as vectors, then this is just theta transpose X.
And the very final piece of notation is I'm also going to let lower case N be the number of features in my learning problem. And so this actually becomes a sum from I equals zero to N, where in this example if you have two features, N would be equal to two.
All right, I realize that was a fair amount of notation, and as I proceed through the rest of the lecture today, or in future weeks as well, if some day you're looking at me write a symbol and you're wondering, gee, what was that simple lower case N again? Or what was that lower case X again, or whatever, please raise hand and I'll answer. This is a fair amount of notation. We'll probably all get used to it in a few days and we'll standardize notation and make a lot of our descriptions of learning algorithms a lot easier.
Notification Switch
Would you like to follow the 'Machine learning' conversation and receive update notifications?