<< Chapter < Page Chapter >> Page >
This module describes how the chi-square distribution can be used to test for independence.

Tests of independence involve using a contingency table of observed (data) values. You first saw a contingency table when you studied bivariate descriptive statistics in the Bivariate Descriptive Statistics chapter.

The test statistic for a test of independence is:

Σ ( i j ) ( O - E ) 2 E

where:

  • O = observed values
  • E = expected values
  • i = the number of rows in the table
  • j = the number of columns in the table

There are i j terms of the form ( O - E ) 2 E .

The Chi-square test of independence determine if there is a relationship between 2 categorical variables. Remember that in the chapter on bivariate data we examined two-way tables (pivot tables) for a relationship by examining either the row or column percentages. We will now test this relationship between categorical variables by calculating a test statistics and determining a p-value.

The null hypothesis for a chi-square test of independence is that there is no relationship between the two categorical variables or that they are independent.

The alternative hypothesis is that there is some kind of relationship between the two categorical variables or that they are dependent.

Before we test the hypothesis we need to check the assumptions and conditions for the chi-square test of independence.

The expected value for each cell needs to be at least 5 in order to use this test.

    Assumptions:

  1. Counted Data Condition
  2. Independence Assumption
  3. Random Sample
  4. 10% Condition
  5. Expected Cell Frequency Condition

The new assumptions for this test are the counted data condition and the expected cell frequency condition. The counted data condition is checking to see if we have counts of respondents categorized on two categorical variables. The numbers in each cell of the two-way table should be whole numbers showing how many people gave that combination of responses to the two categorical questions.

The other new condition, the expected cell frequency condition, is asking us to find how many people we would expect to be in each cell if the null hypothesis is true and there is no relationship between the two variables. To do this we will need to calculate the expected value for each cell in the two-way table. All of the excepted values must be larger than 5.

To find the expected values we will need the row and column totals and the overall sample size. The mathematics for the expected values is;

Expected Value = (row total)(column total) (sample size)

Once all the expected values are calculated check to make sure they are all larger than 5.

Your next step before calculating the test statistics is to calculate the row or column percentages, which is most appropriate. We discussed this in the bivariate data chapter. Remember to determine the dominate variable (look at the research question) and then make the percentage based on if the dominate variable is making rows or columns.

Like the other hypothesis tests we have looked at in this text we will calculate a test statistic. The formula for the chi-square test for independence is:

Practice Key Terms 1

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Collaborative statistics using spreadsheets. OpenStax CNX. Jan 05, 2016 Download for free at http://legacy.cnx.org/content/col11521/1.23
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Collaborative statistics using spreadsheets' conversation and receive update notifications?

Ask