<< Chapter < Page | Chapter >> Page > |
Suppose that a pair of random variables has a joint distribution. A value is observed. It is desired to estimate the corresponding value . Obviously there is no rule for determining unless Y is a function of X . The best that can be hoped for is some estimate based on an average of the errors, or on the average of some function of the errors.
Suppose is observed, and by some rule an estimate is returned. The error of the estimate is . The most common measure of error is the mean of the square of the error
The choice of the mean square has two important properties: it treats positive and negative errors alike, and it weights large errors more heavily than smaller ones.In general, we seek a rule (function) r such that the estimate is . That is, we seek a function r such that
The problem of determining such a function is known as the regression problem . In the unit on Regression , we show that this problem is solved by the conditional expectation of Y , given X . At this point, we seek an important partial solution.
The regression line of Y on X
We seek the best straight line function for minimizing the mean squared error. That is, we seek a function r of the form . The problem is to determine the coefficients a , b such that
We write the error in a special form, then square and take the expectation.
Standard procedures for determining a minimum (with respect to a ) show that this occurs for
Thus the optimum line, called the regression line of Y on X , is
The second form is commonly used to define the regression line. For certain theoretical purposes, this is the preferred form. But for calculation , the first form is usually the more convenient. Only the covariance (which requres both means) andthe variance of X are needed. There is no need to determine or ρ .
Notification Switch
Would you like to follow the 'Applied probability' conversation and receive update notifications?