<< Chapter < Page Chapter >> Page >

Pearson's chi-square test for goodness-of-fit

One of Pearson's most significant achievements occurred in 1900, when he developed a statistical test called Pearson's chi-square (Χ 2 ) test, also known as the chi-square test for goodness-of-fit (Pearson, 1900). Pearson's chi-square test is used to examine the role of chance in producing deviations between observed and expected values. The test depends on an extrinsic hypothesis, because it requires theoretical expected values to be calculated. The test indicates the probability that chance alone produced the deviation between the expected and the observed values (Pierce, 2005). When the probability calculated from Pearson's chi-square test is high, it is assumed that chance alone produced the difference. Conversely, when the probability is low, it is assumed that a significant factor other than chance produced the deviation.

In 1912, J. Arthur Harris applied Pearson's chi-square test to examine Mendelian ratios (Harris, 1912). It is important to note that when Gregor Mendel studied inheritance, he did not use statistics, and neither did Bateson, Saunders, Punnett, and Morgan during their experiments that discovered genetic linkage. Thus, until Pearson's statistical tests were applied to biological data, scientists judged thegoodness of fit between theoretical and observed experimental results simply by inspecting the data and drawing conclusions (Harris, 1912). Although this method can work perfectly if one's data exactly matches one's predictions, scientific experiments often have variability associated with them, and this makes statistical tests very useful.

The chi-square value is calculated using the following formula:

Using this formula, the difference between the observed and expected frequencies is calculated for each experimental outcome category. The difference is then squared and divided by the expectedfrequency. Finally, the chi-square values for each outcome are summed together, as represented by the summation sign (Σ).

Pearson's chi-square test works well with genetic data as long as there are enough expected values in each group. In the case of small samples (less than 10 in any category) that have 1 degree of freedom, the test is not reliable. (Degrees of freedom, or df, will be explained in full later in this article.) However, in such cases, the test can be corrected by using the Yates correction for continuity, which reduces the absolute value of each difference between observed and expected frequencies by 0.5 before squaring. Additionally, it is important to remember that the chi-square test can only be applied to numbers ofprogeny, not to proportions or percentages.

Now that you know the rules for using the test, it's time to consider an example of how to calculate Pearson's chi-square. Recall that when Mendel crossed his pea plants, he learned that tall (T) wasdominant to short (t). You want to confirm that this is correct, so you start by formulating the following null hypothesis: In a cross between two heterozygote (Tt) plants, the offspring should occur in a 3:1 ratio of tall plants to short plants. Next, you cross the plants, and after the cross, you measure the

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Open genetics. OpenStax CNX. Jan 08, 2015 Download for free at https://legacy.cnx.org/content/col11744/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Open genetics' conversation and receive update notifications?

Ask