Activity:Association
Consider
the following research questions.
RQ#1:
Are students who wear glasses more likely to make A’s than those without
glasses?
RQ#2:
Is there a link between one’s preferred hot dog condiment and whether one
has an aggressive or passive personality?
RQ#3:
Is there a tendency for students who are left-eye dominant to sit on the
left side of the classroom relatively more often than those who are right-eye
dominant?
RQ#4:
Do males and females differ with respect to their toilet paper rolling
preferences – from the top or from the bottom?
1. In
each case there are two variables of interest. Identify
the variables, say whether they are quantitative or categorical, and give
the possible values/categories.
|
|
|
|
|
|
|
|
|
1.
|
|
|
|
|
|
2.
|
|
|
|
|
|
1.
|
|
|
|
|
|
2.
|
|
|
|
|
|
1.
|
|
|
|
|
|
2.
|
|
|
|
|
|
1.
|
|
|
|
|
|
2.
|
|
|
2. Next
we will begin to think about some issues that will form the basis for the
statistical method commonly used to address research questions such as
those listed above. When gathering
data for two categorical variables, it is useful to summarize the data
with a two-way table or cross-tabulation. For
example, suppose 100 randomly selected UA students were asked whether they
prefer regular or diet soft drinks. The
results could be summarized by gender as follows.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(a) From
the above table we see that more females chose regular than males in the
above data set. Would it be reasonable
to report the findings with the headline, “Females are more likely to choose
regular cola than males”? Discuss.
(b)
Is there a better way to summarize the findings?
3. Now
consider another example. In
general, about 10% of people are left-handed and 90% are right-handed. Do
you think one’s gender is associated with one’s dominant hand?
I don’t think so either. Let’s
suppose we take a random sample of 100 people, 60 of whom are females and
40 are males. If we assume
that gender and dominant hand are not associated, how many females out
of the 60 would we expect to be left-handed? ____
How many of the females would we expect to be right-handed? ____
Of the 40 males, how many should be lefty ____ and righty ____?
Put these figures into the following table. These
are called the “Expected cell counts.”
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4. Now
suppose that you actually collected data for 100 randomly selected people
and summarized the observations as follows.
These values are called the “Observed cell counts.”
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(a) Did
these observed counts match exactly with the counts you were expecting?
______
(b) List
the discrepancies between the observed and expected counts in the table
in #3, in parentheses, next to the expected counts.
(c) Do
you think that these discrepancies are just due to sampling variation or
do they indicate that there actually is an association between gender and
dominant hand?
5. Next,
let’s revisit the Gender & Soft drink preference question. Suppose,
for sake of argument, that there is no association between Gender and preference
for Regular vs. Diet soft drinks. That
is, suppose that the same percentages of males and females prefer Regular.
(a) In
the data set you considered in #2, what was the overall percentage who
said they preferred Regular?__________.
(b) If
the same percentage holds for males and females then we would expect ______
of the 60 females to prefer Regular, and ______ to prefer Diet.
Also, we would expect _______ of the 40 males to prefer Regular and _____
to prefer Diet. Put these
numbers into the following table of Expected cell counts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(c) Now
go back to #2 and compare the Observed cell counts shown there to the Expected
counts you have listed above. Mark
the amount of the discrepancy for each cell next to the expected count
in the table in (b).
(d) Do
you think that these discrepancies are just due to sampling variation or
do they indicate that there actually is an association between gender and
soft drink preference?
(e) Compare
the data for the Cola preferences to the data for dominant hand. Which
of these data sets provides greater evidence that there is an association
between the two categorical variables? Why?
6. The
statistical method that is used to investigate whether two categorical
variables are associated is called the Chi-Square test. (Chi
is pronounced “kye”.) In class
we will learn how to do the calculations.(See
what you have to look forward to!) For
now, let’s make Minitab carry out the test and give us the P-value. The
null hypothesis for such tests states that the variables are not associated. The
alternative says that they are. Start
Minitab and type the observed counts for Problem #3, into the worksheet.
(Leave out the genders. See
below.) Then go to Stat > Tables > Chi-Square test. Put
Lefty and Righty into the “columns containing the table” box. Click
OK.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(a) Give
the P-value for the test._______
(b) State
a conclusion in plain language.
7. Following
the pattern established in #6, make Minitab do the Chi-Square test to see
if Gender and Soft drink preference are associated.
(a) Give
the P-value for the test._______
(b) State
a conclusion in plain language.