<< Chapter < Page | Chapter >> Page > |
Very frequently asked question in statistical consulting is, how large should the sample size be to estimate a mean?
The answer will depend on the variation associated with the random variable under observation. The statistician could correctly respond, only one item is needed, provided that the standard deviation of the distribution is zero. That is, if is equal zero, then the value of that one item would necessarily equal the unknown mean of the distribution. This is the extreme case and one that is not met in practice. However, the smaller the variance, the smaller the sample size needed to achieve a given degree of accuracy.
A mathematics department wishes to evaluate a new method of teaching calculus that does mathematics using a computer. At the end of the course, the evaluation will be made on the basis of scores of the participating students on a standard test. Because there is an interest in estimating the mean score , for students taking calculus using computer so there is a desire to determine the number of students, n , who are to be selected at random from a larger group. So, let find the sample size n such that we are fairly confident that contains the unknown test mean , from past experience it is believed that the standard deviation associated with this type of test is 15. Accordingly, using the fact that the sample mean of the test scores, , is approximately , it is seen that the interval given by will serve as an approximate 95% confidence interval for .
That is, or equivalently and thus or n =865 because n must be an integer. It is quite likely that it had not been anticipated that as many as 865 students would be needed in this study. If that is the case, the statistician must discuss with those involved in the experiment whether or not the accuracy and the confidence level could be relaxed some. For illustration, rather than requiring to be a 95% confidence interval for , possibly would be satisfactory for 80% one. If this modification is acceptable, we now have or equivalently, and thus . Since n must be an integer = 93 is used in practice.
Most likely, the person involved in this project would find this a more reasonable sample size. Of course, any sample size greater than 93 could be used. Then either the length of the confidence interval could be decreased from that of or the confidence coefficient could be increased from 80% or a combination of both. Also, since there might be some question of whether the standard deviation actually equals 15, the sample standard deviations would no doubt be used in the construction of the interval.
For example , suppose that the sample characteristics observed are then, or provides an approximate 80% confidence interval for .
In general, if we want the confidence interval for , , to be no longer than that given by , the sample size n is the solution of where
That is, where it is assumed that is known.
Sometimes is called the maximum error of the estimate . If the experimenter has no ideas about the value of , it may be necessary to first take a preliminary sample to estimate .
The type of statistic we see most often in newspaper and magazines is an estimate of a proportion p . We might, for example, want to know the percentage of the labor force that is unemployed or the percentage of voters favoring a certain candidate. Sometimes extremely important decisions are made on the basis of these estimates. If this is the case, we would most certainly desire short confidence intervals for p with large confidence coefficients. We recognize that these conditions will require a large sample size. On the other hand, if the fraction p being estimated is not too important, an estimate associated with a longer confidence interval with a smaller confidence coefficients is satisfactory; and thus a smaller sample size can be used.
In general , to find the required sample size to estimate p , recall that the point estimate of p is
Suppose we want an estimate of p that is within of the unknown p with confidence where is the maximum error of the point estimate . Since is unknown before the experiment is run, we cannot use the value of in our determination of n . However, if it is known that p is about equal to , the necessary sample size n is the solution of That is,
Notification Switch
Would you like to follow the 'Introduction to statistics' conversation and receive update notifications?