**Statistic of the Week
**

**Analysis of Variance
- ANOVA and the F-Test**

**7400.685.080 Research
Methods in HE/FE - - - - Inst: David D. Witt, Ph.D.**

Think back to the T-Test discussion. Remember that a t-test is a test for significant differences between two means -- that is, for testing whether a difference between two means is statistically significant. For example, whether the divorce rate is higher for poor people than for rich people.

The average number of divorces for poor people and the average number of divorces for rich people are the TWO MEANS - t-testing tells us if the difference is significant or not.

**The next logical question is:** *what do we do if we have three
or more means*??

What if we included **additional economic groups** to our study of
**divorce rates**:

Group 1 = Rich / Group 2 = Middle Class / Group 3 = Working Class / Group
4 = Poor

We could do a series of t-tests between all the possible Groups

(i.e., 1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, and 3 and 4).

There are two reasons why this is ** not a good idea**.

1. If we run a large number of tests of significance, we can expect
*as a matter of chance* that a certain proportion of the tests will
be significant. Out of a hundred tests, the laws of chance suggest 5 will
be erroneously significant.

2. In research designs with more than one independent variable, we **have
to acknowledge** that the **independent variables DO influence each
other, as well as the Dependent variable**.

The t-test series wouldn't account for, or "partial out" these
interaction effects between independent variables.

The statistical procedure known as **Analysis of Variance**, used
in conjunction with an **F-Test**, can be employed to solve the dilemmas
of #1 and #2 above when we wish to make comparisons among three or more
means (three or more "Independent" variables, three or more income
groups!).

Here's an example of analysis of variance using schools and kids. We draw a sample of n = 6 kids from each of k = 4 schools. Each school (the independent variable) is in a different neighborhood, using different teaching techniques. Each student is measured on the S.A.T.(dependent variable) and results are:

School 1 |
School 2 |
School 3 |
School 4 |
Mean of X |

15 |
17 |
17 |
15 |
School 1 = 15.00 |

14 |
17 |
15 |
17 |
School 2 = 18.00 |

15 |
19 |
13 |
19 |
School 3 = 16.50 |

16 |
19 |
18 |
18 |
School 4 = 17.00 |

17 |
20 |
18 |
16 |
Sum of Means 66.50 |

13 |
16 |
18 |
17 |
Grand Mean = 16.625 |

X Mean x |
X Mean x |
X Mean x |
X Mean x |
The only thing new here is the Grand Mean idea. |

X |
X |
X |
X |

Now - here comes the Analysis of Variance Part!!!

First, we calculate the **WITHIN GROUPS VARIANCE**:

w = (X_{1}+
X_{2}+
X_{3}+
X_{4})/N
(the number of groups)

By substitution we have: **w**
= (2.0 + 2.4 + 4.3 + 2.0) / 4 = 10.7 / 4 = **2.675 **

Now we calculate the BETWEEN GROUPS VARIANCE:

School | x = Sample mean - Grand mean | x^{2} |
n | x^{2}/n |

#1 | 15.00-16.625 = -1.625 | 2.64 | 6 | 15.84 |

#2 | 18.00-16.625 = 1.375 | 1.89 | 6 | 11.34 |

#3 | 16.50-16.625 = -0.125 | 0.02 | 6 | 00.12 |

#4 | 17.00-16.625 = -0.375 | 0.14 | 6 | 00.84 |

N=24 | (x^{2}/n) = 28.14 |

From here we can calculate the b
= (x^{2}/n) /(k-1)

b = 28.14/(4-1) = **9.38**

Now we'll calculate the** F-ratio**: F = b/**w
=9.38/2.65 = 3.51**

Just like the t-test, we are going to have to decide whether this particular obtained F-ratio is statistically signioficantly different from an F-ratio that could have oocurred by chance.

To do this, we need one more piece of information. Degrees of freedom (df) associated with each variance in the problem.

Between Groups degrees of freedom: df_{b }= (k-1) = (4-1) =3

where "k" is the number of groups.

Within Groups degrees of freedom: df_{w} = (N-1) = (24-4) =
20

where "N" is the total number of subjects

Find the **Table of Critical Values of F** (below, like the t-values
table)

and look up **df = 3 and df = 20** - - -

the **Critical Value of F with an alpha a = .05 is 3.10**

the **Obtained Value of F (calculated) was ---- 3.51 **

**Interpretation: **Since our Obtained Value is bigger we have to
say that there is a true difference between all the sample means - a difference
that could not have happend by chance.

Click here for F-Table