Name ____________________________

Activity:Association
 

Consider the following research questions.
 

RQ#1: Are students who wear glasses more likely to make A’s than those without glasses?
 

RQ#2: Is there a link between one’s preferred hot dog condiment and whether one has an aggressive or passive personality?
 

RQ#3: Is there a tendency for students who are left-eye dominant to sit on the left side of the classroom relatively more often than those who are right-eye dominant?
 

RQ#4: Do males and females differ with respect to their toilet paper rolling preferences – from the top or from the bottom?
 

1.  In each case there are two variables of interest.  Identify the variables, say whether they are quantitative or categorical, and give the possible values/categories.


 
Variables
Type
Values/Categories
RQ#1
1.
2.
 
RQ#2
1.
2.
 
RQ#3
1.
2.
 
RQ#4
1.
2.


 
 
 

2. Next we will begin to think about some issues that will form the basis for the statistical method commonly used to address research questions such as those listed above. When gathering data for two categorical variables, it is useful to summarize the data with a two-way table or cross-tabulation. For example, suppose 100 randomly selected UA students were asked whether they prefer regular or diet soft drinks. The results could be summarized by gender as follows.


 
Regular
Diet
Female
36
24
Male
34
6

 
 
 

(a)  From the above table we see that more females chose regular than males in the above data set. Would it be reasonable to report the findings with the headline, “Females are more likely to choose regular cola than males”? Discuss.


 

(b)  Is there a better way to summarize the findings?
 

3.  Now consider another example. In general, about 10% of people are left-handed and 90% are right-handed. Do you think one’s gender is associated with one’s dominant hand? I don’t think so either. Let’s suppose we take a random sample of 100 people, 60 of whom are females and 40 are males. If we assume that gender and dominant hand are not associated, how many females out of the 60 would we expect to be left-handed? ____  How many of the females would we expect to be right-handed? ____   Of the 40 males, how many should be lefty ____ and righty ____?  Put these figures into the following table.  These are called the “Expected cell counts.”


 
Lefty
Righty
Female
Male

 
 

4.  Now suppose that you actually collected data for 100 randomly selected people and summarized the observations as follows. These values are called the “Observed cell counts.”

 
 
Lefty
Righty
Female
7
53
Male
3
37

 
 

(a)  Did these observed counts match exactly with the counts you were expecting? ______
 

(b)  List the discrepancies between the observed and expected counts in the table in #3, in parentheses, next to the expected counts.
 

(c)  Do you think that these discrepancies are just due to sampling variation or do they indicate that there actually is an association between gender and dominant hand? 
 
 
 

5.  Next, let’s revisit the Gender & Soft drink preference question. Suppose, for sake of argument, that there is no association between Gender and preference for Regular vs. Diet soft drinks. That is, suppose that the same percentages of males and females prefer Regular.

(a)  In the data set you considered in #2, what was the overall percentage who said they preferred Regular?__________.
 

(b)  If the same percentage holds for males and females then we would expect ______ of the 60 females to prefer Regular, and ______ to prefer Diet. Also, we would expect _______ of the 40 males to prefer Regular and _____ to prefer Diet. Put these numbers into the following table of Expected cell counts.
 


 
Regular
Diet
Female
Male

 
 

(c)  Now go back to #2 and compare the Observed cell counts shown there to the Expected counts you have listed above. Mark the amount of the discrepancy for each cell next to the expected count in the table in (b).
 

(d)  Do you think that these discrepancies are just due to sampling variation or do they indicate that there actually is an association between gender and soft drink preference? 
 
 

(e)  Compare the data for the Cola preferences to the data for dominant hand. Which of these data sets provides greater evidence that there is an association between the two categorical variables? Why?
 
 
 
 

6.  The statistical method that is used to investigate whether two categorical variables are associated is called the Chi-Square test. (Chi is pronounced “kye”.) In class we will learn how to do the calculations.(See what you have to look forward to!) For now, let’s make Minitab carry out the test and give us the P-value. The null hypothesis for such tests states that the variables are not associated. The alternative says that they are. Start Minitab and type the observed counts for Problem #3, into the worksheet.  (Leave out the genders. See below.)  Then go to Stat > Tables > Chi-Square test. Put Lefty and Righty into the “columns containing the table” box. Click OK. 
 


 
Lefty
Righty
7
53
3
37

 

(a)  Give the P-value for the test._______
 

(b)  State a conclusion in plain language.
 

7.  Following the pattern established in #6, make Minitab do the Chi-Square test to see if Gender and Soft drink preference are associated.
 

(a)  Give the P-value for the test._______
 

(b)  State a conclusion in plain language.