Statistic of the Week

Bi-Variate Statistics

Contingency Tables & the Chi Square Statistic

7400.685.080 Research Methods in HE/FE - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Inst: D. Witt

Up to now we have been looking at univariate statistics (one variable at a time). This is fine for t-tests and for the assessment of skewness and kurtosis in all our variables of interest.

Now we can begin to take a look at real analysis with two variables. This is bivariate statistical analysis. (by the end of the semester, we will be doing multivariate statistics!!!).

There are several statistics that operate under the bivariate label. The first one is the Chi Square statistic. But first ... we have to have an understanding of Contingency Tables (also known as crosstabulations, or crosstabs for short).

Crosstabulations explain the distribution of values that two variables have in common.

For example: Suppose we have a theory that hypothesizes a relationship between:
Gender (1=male/2=female) and Attitudes Toward Gun control (1=favor/2=don't favor).
That is, we hypothesize that men are more in favor of gun control than women

S0 ---Hyp.1: Men are more in Favor of Gun control than Women.

We construct a questionnaire that includes these two concepts, distribute it, and code our data into the computer.

When we ask the computer for a Frequency Distribution of the variables, we can fill in a table that reflects the following:

for Gender: 30% of the sample are men - 70% of the sample are women

for Gun Control: 60% of the sample favor control - 40% of the sample don't favor control

If we were to ask the computer to spit out a Contingency table using the variables Gender and Gun Control we'd get this:

Gender Male Female Yes Cell1 Cell2 Favor Observed 8% Observed 52% 60% Gun Control No Cell3 Cell4 Observed 22% Observed 18% 40% 30% 70% 100%

Contingency Table

Gender



Favor Gun Control

male

female

marginals

yes

Cell1

Observed 8%

Cell2

Observed 52%

60%

no

Cell3

Observed 22%

Cell4

Observed 18%

40%

marginals

30%

70%

100%

NOTICE The computer gives us the Observed cell percentages and:
-as you add down the Male column of cells, 8% + 22% = 30%
-as you add across the Yes column of cells, 8% + 52% = 60%
-as you add down the column marginals, 60% + 40% = 100% annnnnd ..
-as you add across the row marginals, 30% + 70% = 100%

This, children, is a two by two (2x2) contingency table and is the simplest form of one. It tells us what we really observed, but doesn't say much about whether or not our observations are different from what we would normallys expect to see ====> W have to insert Expected and Observed percentages on our Contingency Table to do that:

Knowing the Observed Scores (Fo) from the table above, we calculate the Expected Scores (Fe) for each cell (k) in the contingency table using this formula:

Fek = (row total x column total)/grand total

We have four cells, so we calculate four Fe's: We already know:
Fe1 = (60 x 30)/100 = 18% Fo1 = 8%
Fe2 = (60 x 70)/100 = 42% Fo2 = 52%
Fe3 = (30 x 40)/100 = 12% Fo3 = 22%
Fe4 = (70 x 40)/100 = 28% Fo4 = 18%

Now we can insert Expected frequencies in the contingency table:

Contingency Table

Gender

Gender


Favor Gun Control

Male

Female

marginals

yes

Cell1 - Exp.Fe118%

Obs. Fo1 8%

Cell2 - Exp.Fe2 42%

Obs. Fo2 52%

60%

no

Cell3 - Exp.Fe312%

Obs. Fo3 22%

Cell4 - Exp.Fe428%

Obs. Fo4 18%

40%

marginals

30%

70%

100%

To calculate the Chi2 Statistic: use the formula:

Chi2 = (Fok - Fek)2 /Fek
and by substitutuion: ((8-18)2 / 18)+((52-42)2 /42)+((22-12)2 /12+((18-28)2 /28) = 19.84

So the Obtained (calculated) Chi2 Value is 19.84

But is 19.84 a statistically significant chi square value?
We need to look in the chi square distribution table (below) ...
and ... we need to know the Degrees of Freedom (df's)
For a 2x2 table the df is 1 because as soon as one cell is filled, all the others are determined.

Look in the Distribution of Chi Square Table (below) and find the place where df = 1.
Follow from left to right until you find a listed "critical" value of Chi2 that is bigger than your calculated one or run out of values.

For this example df = 1 has a Critical Chi2 value of 10.827 at the 99.9th percentile,
or a probability level of p<.001 (that's 100%-99.9%=.1% or p<.001)

Enterpreting these data, we can say either:

-men differ significantly from women when it comes to favoring gun control.

or -people who favor gun control are more likely to be women than men.


Name ______________________________________________________ Homework Assignment: Contingency Tables and the Chi Square Statistic

1. You are given the following data concerning the relationship between: education and type of community where subjects were raised.

Community in which respondent lived most of the time from age 13 to 19.


Community in which respondent

Rural

lived from age 13 to 19

Urban

Row Marginals

Education

12 years or less

Cell1 Exp._____

Obs. 35%

Cell2 Exp._____

Obs. 20%

55%

Education

Over 12 years

Cell3 Exp._____

Obs. 15%

Cell4 Exp._____

Obs. 30%

45%

Column Marginals

50%

50%

100%

a. Fill in the table above with Fek values (expected values). Show your work!

b. What is the Calculated Chi2 value?

c. Look up the expected, or "critical", Chi2 value in the table of chi square distribution?

d. Summarize the nature of the relationship in a few sentences.

e. Generalize from the data.


2. Let us suppose you are interested in studying the relationship between Intelligence and Memory.

You design a study that measures IQ for intelligence, and Cognitive Retention of Beatles Lyrics (i.e., Fill in the blank "I'll buy you a ________ ________ my friend if it makes you feel alright.").

Your hypothesis:
The more intelligent the respondent, the more Beatle lyrics that can be remembered.

To test this hypothesis, you select a random sample of dormitory residents at the UofA and provide them with a cassette tape with 30 songs on it. You ask respondents to listen to the tape once each day for seven days. At the end of the week you administor the test and measure their intelligence.

Here are the results in crosstab form:


Number of Beatle

0-15 lyrics

Lyrics Remembered

Over 15 remembered

Row

Marginals

IQ Scores

Under 100

Cell1 Exp____

Obs. 25%

Cell2 Exp ____

Obs. 10%

35%
IQ Scores

100-129

Cell3 Exp ____

Obs. 15%

Cell4 Exp. ____

Obs. 20%

35%
IQ Scores

Over 129

Cell5 Exp ____

Obs. 5%

Cell6 Exp. ____

Obs. 25%

30%
Column

Marginals

45% 55% 100%

a. Fill in the table above with Expected Values. Show your work!

b. What is the calculated/obtained Chi2 value?

c. Look up the expected/critical Chi2 value?

d. Summarize the nature of the relationship in a couple of sentences.

e. Generalize the findings.