<< Chapter < Page | Chapter >> Page > |
The Central Limit Theorem, as before, provides us with the standard deviation of the sampling distribution, and further, that the expected value of the mean of the distribution of differences in sample means is equal to the differences in the population means. Mathematically this can be stated:
Because we do not know the population standard deviations, we estimate them using the two sample standard deviations from our independent samples. For the hypothesis test, we calculate the estimated standard deviation, or standard error , of the difference in sample means , – .
The test statistic ( t -score) is calculated as follows:
The number of degrees of freedom ( df ) requires a somewhat complicated calculation. The df are not always a whole number. The test statistic calculated previously is approximated by the Student's t -distribution with df as follows:
When both sample sizes n 1 and n 2 are five or larger, the Student's t approximation is very good. If each sample has more than 30 observations then the degrees of freedom can be calculated as n1 + n2 - 2.
The format of the sampling distribution, differences in sample means, specifies that the format of the null and alternative hypothesis is:
where δ 0 is the hypothesized difference between the two means. If the question is simply “is there any difference between the means?” then δ 0 = 0 and the null and alternative hypotheses becomes:
An example of when δ 0 might not be zero is when the comparison of the two groups requires a specific difference for the decision to be meaningful. Imagine that you are making a capital investment. You are considering changing from your current model machine to another. You measure the productivity of your machines by the speed they produce the product. It may be that a contender to replace the old model is faster in terms of product throughput, but is also more expensive. The second machine may also have more maintenance costs, setup costs, etc. The null hypothesis would be set up so that the new machine would have to be better than the old one by enough to cover these extra costs in terms of speed of production. This form of the null and alternative hypothesis shows how valuable this particular hypothesis test can be. For most of our work we will be testing simple hypotheses asking if there is any difference between the two distribution means.
The average amount of time boys and girls aged seven to 11 spend playing sports each day is believed to be the same. A study is done and data are collected, resulting in the data in [link] .
Sample Size | Average Number of Hours Playing Sports Per Day | Sample Standard Deviation | |
---|---|---|---|
Girls | 9 | 2 | |
Boys | 16 | 3.2 | 1.00 |
Notification Switch
Would you like to follow the 'Introductory statistics' conversation and receive update notifications?