<< Chapter < Page | Chapter >> Page > |
In the preceding considerations ( Confidence Intervals I ), the confidence interval for the mean of a normal distribution was found, assuming that the value of the standard deviation is known. However, in most applications, the value of the standard deviation is rather unknown, although in some cases one might have a very good idea about its value.
Suppose that the underlying distribution is normal and that is unknown. It is shown that given random sample from a normal distribution, the statistic has a t distribution with degrees of freedom, where is the usual unbiased estimator of , (see, t distribution ).
Select so that Then
Thus the observations of a random sample provide a and and is a interval for .
Let X equals the amount of butterfat in pound produced by a typical cow during a 305-day milk production period between her first and second claves. Assume the distribution of X is . To estimate a farmer measures the butterfat production for n-20 cows yielding the following data:
481 | 537 | 513 | 583 | 453 | 510 | 570 |
500 | 487 | 555 | 618 | 327 | 350 | 643 |
499 | 421 | 505 | 637 | 599 | 392 | - |
For these data, and . Thus a point estimate of is . Since , a 90% confidence interval for is , or equivalently, [472.80, 542.20].
Let T have a t distribution with n -1 degrees of freedom. Then, . Consequently, the interval is expected to be shorter than the interval . After all, there gives more information, namely the value of , in construction the first interval. However, the length of the second interval is very much dependent on the value of s . If the observed s is smaller than , a shorter confidence interval could result by the second scheme. But on the average, is the shorter of the two confidence intervals.
If it is not possible to assume that the underlying distribution is normal but and are both unknown, approximate confidence intervals for can still be constructed using which now only has an approximate t distribution.
Generally, this approximation is quite good for many normal distributions, in particular, if the underlying distribution is symmetric, unimodal, and of the continuous type. However, if the distribution is highly skewed , there is a great danger using this approximation. In such a situation, it would be safer to use certain nonparametric method for finding a confidence interval for the median of the distribution.
The confidence interval for the variance is based on the sample variance
In order to find a confidence interval for , it is used that the distribution of is . The constants a and b should selected from tabularized Chi Squared Distribution with n -1 degrees of freedom such that
That is select a and b so that the probabilities in two tails are equal: and Then, solving the inequalities, we have
Thus the probability that the random interval contains the unknown is 1- . Once the values of are observed to be and computed, then the interval is a confidence interval for .
It follows that is a confidence interval for , the standard deviation.
Assume that the time in days required for maturation of seeds of a species of a flowering plant found in Mexico is . A random sample of n =13 seeds, both parents having narrow leaves, yielded =18.97 days and .
A confidence interval for is , because and , what can be read from the tabularized Chi Squared Distribution. The corresponding 90% confidence interval for is
Although a and b are generally selected so that the probabilities in the two tails are equal, the resulting confidence interval is not the shortest that can be formed using the available data. The tables and appendixes gives solutions for a and b that yield confidence interval of minimum length for the standard deviation.
Notification Switch
Would you like to follow the 'Introduction to statistics' conversation and receive update notifications?