<< Chapter < Page | Chapter >> Page > |
When constructing confidence intervals the assumptions and conditions of the central limit theorem must be met in order to use the normal model.
Randomization Condition: The data must be sampled randomly. Is one of the good sampling methodologies discussed in the Sampling and Data chapter being used?
Independence Assumption : The sample values must be independent of each other. This means that the occurrence of one event has no influence on the next event. Usually, if we know that people or items were selected randomly we can assume that the independence assumption is met.
10% Condition: When the sample is drawn without replacement (usually the case), the sample size, n , should be no more than 10% of the population.
Sample Size Condition: The sample size must be sufficiently large. Although the Central Limit Theorem tells us that we can use a Normal model to think about the behavior of sample means when the sample size is large enough, it does not tell us how large that should be. If the population is very skewed, you will need a pretty large sample size to use the CLT, however if the population is unimodal and symmetric, even small samples are ok. So think about your sample size in terms of what you know about the population and decide whether the sample is large enough. In general a sample size of 30 is considered sufficient.
When working with numerical data and σ is unknown the assumptions of randomization, independence and the 10% condition must be met. In addition, with small sample sizes we cannot assume that that data follows a normal distribution so we need to check the nearly normal condition . To check the nearly normal condition start by making a histogram or stemplot of the data, it is a good idea to make an outlier boxplot, too. If the sample is small, less than 15 then the data must be normally distributed. If the sample size is moderate, between 15 and 40, then a little skewing in the data will can be tolerated. With large sample sizes, more than 40, we are concerned about multiple peaks (modes) in the data and outliers. The data might not be approximately normal with either of these conditions and you may want to run the test both with and without the outliers to determine the extent of their effect. If there are multiple modes in the data if could be that there are two groups in the data that need to be separated.
When working with binomial or categorical data the assumptions of randomization, independence and the 10% condition must be met. In addition, a new assumption, the success/ failure condition , must be checked. When working with proportions we need to be especially concerned about sample size when the proportion is close to zero or one. To check that the sample size is large enough calculate the success by multiplying the sample percentage by the sample size and calculate failure by multiplying one minus the sample percentage by the sample size. If both of these products are larger than ten then the condition is met.
Notification Switch
Would you like to follow the 'Collaborative statistics using spreadsheets' conversation and receive update notifications?