<< Chapter < Page Chapter >> Page >

The simple pair of Example 3 From "variance"

jdemo1 jcalcEnter JOINT PROBABILITIES (as on the plane) P Enter row matrix of VALUES of X XEnter row matrix of VALUES of Y Y Use array operations on matrices X, Y, PX, PY, t, u, and PEX = total(t.*P) EX = 0.6420EY = total(u.*P) EY = 0.0783VX = total(t.^2.*P) - EX^2 VX = 3.3016CV = total(t.*u.*P) - EX*EY CV = -0.1633a = CV/VX a = -0.0495b = EY - a*EX b = 0.1100 % The regression line is u = -0.0495t + 0.11
Got questions? Get instant answers now!

The pair in Example 6 From "variance"

Suppose the pair { X , Y } has joint density f X Y ( t , u ) = 3 u on the triangular region bounded by u = 0 , u = 1 + t , u = 1 - t . Determine the regression line of Y on X .

ANALYTIC SOLUTION

By symmetry, E [ X ] = E [ X Y ] = 0 , so Cov [ X , Y ] = 0 . The regression curve is

u = E [ Y ] = 3 0 1 u 2 u - 1 1 - u d t d u = 6 0 1 u 2 ( 1 - u ) d u = 1 / 2

Note that the pair is uncorrelated, but by the rectangle test is not independent. With zero values of E [ X ] and E [ X Y ] , the approximation procedure is not very satisfactory unless a very large number of approximation points are employed.

Got questions? Get instant answers now!

Distribution of Example 5 From "random vectors and matlab" and Example 12 From "function of random vectors"

The pair { X , Y } has joint density f X Y ( t , u ) = 6 37 ( t + 2 u ) on the region 0 t 2 , 0 u max { 1 , t } (see Figure [link] ). Determine the regression line of Y on X . If the value X ( ω ) = 1 . 7 is observed, what is the best mean-square linear estimate of Y ( ω ) ?

Figure one contains two lines in the first quadrant of a cartesian graph. The horizontal axis is labeled t, and the vertical axis is labeled u. The title caption reads f_xy (t, u) = (6/37)(t + 2u). The first line crosses the vertical axis one quarter of the way up the graph. It has a positive slope, and is labeled u = 0.3382t + 0.4011. It continues as a linear plot from one side of the graph to the other. The second line begins horizontally as one segment from the left to point (1, 1). The segment is labeled u = 1. After point (1, 1), the line moves upward with a positive, constant slope to point (2, 2). This segment is labeled u = t. At (2, 2) there is a vertical line continuing downward to point (2, 0). Figure one contains two lines in the first quadrant of a cartesian graph. The horizontal axis is labeled t, and the vertical axis is labeled u. The title caption reads f_xy (t, u) = (6/37)(t + 2u). The first line crosses the vertical axis one quarter of the way up the graph. It has a positive slope, and is labeled u = 0.3382t + 0.4011. It continues as a linear plot from one side of the graph to the other. The second line begins horizontally as one segment from the left to point (1, 1). The segment is labeled u = 1. After point (1, 1), the line moves upward with a positive, constant slope to point (2, 2). This segment is labeled u = t. At (2, 2) there is a vertical line continuing downward to point (2, 0).
Regression line for [link] .

ANALYTIC SOLUTION

E [ X ] = 6 37 0 1 0 1 ( t 2 + 2 t u ) d u d t + 6 37 1 2 0 t ( t 2 + 2 t u ) d u d t = 50 / 37

The other quantities involve integrals over the same regions with appropriate integrands, as follows:

Quantity Integrand Value
E [ X 2 ] t 3 + 2 t 2 u 779/370
E [ Y ] t u + 2 u 2 127/148
E [ X Y ] t 2 u + 2 t u 2 232/185

Then

Var [ X ] = 779 370 - 50 37 2 = 3823 13690 Cov [ X , Y ] = 232 185 - 50 37 · 127 148 = 1293 13690

and

a = Cov [ X , Y ] / Var [ X ] = 1293 3823 0 . 3382 , b = E [ Y ] - a E [ X ] = 6133 15292 0 . 4011

The regression line is u = a t + b . If X ( ω ) = 1 . 7 , the best linear estimate (in the mean square sense) is Y ^ ( ω ) = 1 . 7 a + b = 0 . 9760 (see [link] for an approximate plot).

APPROXIMATION

tuappr Enter matrix [a b]of X-range endpoints [0 2] Enter matrix [c d]of Y-range endpoints [0 2] Enter number of X approximation points 400Enter number of Y approximation points 400 Enter expression for joint density (6/37)*(t+2*u).*(u<=max(t,1)) Use array operations on X, Y, PX, PY, t, u, and PEX = total(t.*P) EX = 1.3517 % Theoretical = 1.3514EY = total(u.*P) EY = 0.8594 % Theoretical = 0.8581VX = total(t.^2.*P) - EX^2 VX = 0.2790 % Theoretical = 0.2793CV = total(t.*u.*P) - EX*EY CV = 0.0947 % Theoretical = 0.0944a = CV/VX a = 0.3394 % Theoretical = 0.3382b = EY - a*EX b = 0.4006 % Theoretical = 0.4011y = 1.7*a + b y = 0.9776 % Theoretical = 0.9760
Got questions? Get instant answers now!

An interpretation of ρ 2

The analysis above shows the minimum mean squared error is given by

E [ ( Y - Y ^ ) 2 ] = E ( Y - ρ σ Y σ X ( X - μ X ) - μ Y ) 2 = σ Y 2 E [ ( Y * - ρ X * ) 2 ]
= σ Y 2 E [ ( Y * ) 2 - 2 ρ X * Y * + ρ 2 ( X * ) 2 ] = σ Y 2 ( 1 - 2 ρ 2 + ρ 2 ) = σ Y 2 ( 1 - ρ 2 )

If ρ = 0 , then E [ ( Y - Y ^ ) 2 ] = σ Y 2 , the mean squared error in the case of zero linear correlation. Then, ρ 2 is interpreted as the fraction of uncertainty removed by the linear rule and X . This interpretation should not be pushed too far, but is a common interpretation, often found in the discussion of observations orexperimental results.

More general linear regression

Consider a jointly distributed class. { Y , X 1 , X 2 , , X n } . We wish to deterimine a function U of the form

U = i = 0 n a i X i , with X 0 = 1 , such that E [ ( Y - U ) 2 ] is a minimum

If U satisfies this minimum condition, then E [ ( Y - U ) V ] = 0 , or, equivalently

E [ Y V ] = E [ U V ] for all V of the form V = i = 0 n c i X i

To see this, set W = Y - U and let d 2 = E [ W 2 ] . Now, for any α

d 2 E [ ( W + α V ) 2 ] = d 2 + 2 α E [ W V ] + α 2 E [ V 2 ]

If we select the special

α = - E [ W V ] E [ V 2 ] then 0 - 2 E [ W V ] 2 E [ V 2 ] + E [ W V ] 2 E [ V 2 ] 2 E [ V 2 ]

This implies E [ W V ] 2 0 , which can only be satisfied by E [ W V ] = 0 , so that

E [ Y V ] = E [ U V ]

On the other hand, if E [ ( Y - U ) V ] = 0 for all V of the form above, then E [ ( Y - U ) 2 ] is a minimum. Consider

E [ ( Y - V ) 2 ] = E [ ( Y - U + U - V ) 2 ] = E [ ( Y - U ) 2 ] + E [ ( U - V ) 2 ] + 2 E [ ( Y - U ) ( U - V ) ]

Since U - V is of the same form as V , the last term is zero. The first term is fixed. The second term is nonnegative, with zero value iff U - V = 0 a . s . Hence, E [ ( Y - V ) 2 ] is a minimum when V = U .

If we take V to be 1 , X 1 , X 2 , , X n , successively, we obtain n + 1 linear equations in the n + 1 unknowns a 0 , a 1 , , a n , as follows.

  1. E [ Y ] = a 0 + a 1 E [ X 1 ] + + a n E [ X n ]
  2. E [ Y X i ] = a 0 E [ X i ] + a 1 E [ X 1 X i ] + + a n E [ X n X i ] for 1 i n

For each i = 1 , 2 , , n , we take ( 2 ) - E [ X i ] · ( 1 ) and use the calculating expressions for variance and covariance to get

Cov [ Y , X i ] = a 1 Cov [ X 1 , X i ] + a 2 Cov [ X 2 , X i ] + + a n Cov [ X n , X i ]

These n equations plus equation (1) may be solved alagebraically for the a i .

In the important special case that the X i are uncorrelated (i.e., Cov [ X i , X j ] = 0 for i j ), we have

a i = Cov [ Y , X i ] Var [ X i ] 1 i n

and

a 0 = E [ Y ] - a 1 E [ X 1 ] - a 2 E [ X 2 ] - - a n E [ X n ]

In particular, this condition holds if the class { X i : 1 i n } is iid as in the case of a simple random sample (see the section on "Simple Random Samples and Statistics ).

Examination shows that for n = 1 , with X 1 = X , a 0 = b , and a 1 = a , the result agrees with that obtained in the treatment of the regression line, above.

Linear regression with two variables.

Suppose E [ Y ] = 3 , E [ X 1 ] = 2 , E [ X 2 ] = 3 , Var [ X 1 ] = 3 , Var [ X 2 ] = 8 , Cov [ Y , X 1 ] = 5 , Cov [ Y , X 2 ] = 7 , and Cov [ X 1 , X 2 ] = 1 . Then the three equations are

a 0 + 2 a 2 + 3 a 3 = 3 0 + 3 a 1 + 1 a 2 = 5 0 + 1 a 1 + 8 a 2 = 7

Solution of these simultaneous linear equations with MATLAB gives the results

a 0 = - 1 . 9565 , a 1 = 1 . 4348 , and a 2 = 0 . 6957 .

Got questions? Get instant answers now!

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Applied probability. OpenStax CNX. Aug 31, 2009 Download for free at http://cnx.org/content/col10708/1.6
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Applied probability' conversation and receive update notifications?

Ask