Statistic of the Week
Correlation Coefficients and the F-Test
7400.685.080 Research Methods in HE/FE Inst: David D. Witt, Ph.D.
So far, we have examined:
UNIVARIATE (single variable) statistics (i.e., measures of central tendency and measures of dispersion).
and
BIVARIATE (2 variable) statistics (i.e, means comparisons & t-tests, variance comparisions & F-tests).
This week's statistic, the CORRELATION COEFFICIENT, allows a researcher to estimate a relationship between two variables in a more robust manner.
The correlation is designed to be used with more precise measures,
such as Ordinal and Interval Level Measures.
More precise measures allow for more precise arithmetic, so we can compute
correlations for such measures as: - Education level (none, 1 year, 2 years,
3 years .... 25 years of formal education). -Annual Family Income in actual
dollars.
-Life Satisfaction scales, and so on.
For example, it is our culture's Agreement Reality (that is, we all generally agree in principle) that Education is positively related to Income Level. That is to say (and this is important),
"As the number of years of education increase, the number of dollars accrued in a year increase".
To achieve Experiential Reality (actually have proof) on this relationship, we must measure the concepts and empirically express the relationship (i.e., statistically, numerically).
With correlations, we answer two kinds of questions:
Operationalization of Education and Income has been discussed already. Each variable is expressed on the Cartesian plane below (Education is plotted on the x-axis, Income is plotted on the y-axis).
The figure graphed is known as a scattergram. As Education goes up, so does Income!
(NOTE THE ELLIPTICAL NATURE OF THE SCORES AS THEY GATHER AROUND THE REGRESSION LINE). The narrower the ellipse, the more significant the correlation.
More Graphs - Here are four distinct types of correlation, illustrated graphically in scatterplots:
Computing the Product-Moment Correlation
Remember the ideal of the Distribution Table? Well, here it comes again.
The reason you need a distribution, or summary, table is because you will hand calculate sums of scores, means, sums of deviations from means, standard deviations, variance, and the product of the deviations.
First the Equations: Correlation coefficients are expressed using a lower case "r".
The correlation between X (Education) and Y (Income) is calculated using the formula:
r_{xy} = (xy) / N(_{xy})
Where: r = correlation coefficient
x = (X - Mean of x)
y = (Y - Mean of y)
Remember that : _{x}=
x^{2}/N and that _{y}=
y^{2}/N
So, the Distribution Table for Correlations looks like this:
DISTRIBUTION TABLE FOR EDUCATION AND INCOME
Education X |
Income Y |
(X-Mean of X) x |
Y-mean of Y) y |
xy |
x^{2} |
y^{2} |
0 |
10 |
-2.5 |
-25 |
62.5 |
6.25 |
625 |
1 |
20 |
-1.5 |
-15 |
22.5 |
2.25 |
225 |
2 |
30 |
-0.5 |
-05 |
2.5 |
0.25 |
25 |
3 |
40 |
0.5 |
05 |
2.5 |
0.25 |
25 |
4 |
50 |
1.5 |
15 |
22.5 |
2.25 |
225 |
5 |
60 |
2.5 |
25 |
62.5 |
6.25 |
625 |
X=15 |
Y=210 |
x=0 |
y=0 |
xy=175 |
x^{2}=17.5 |
y^{2}=1750 |
N=6 |
Mean x = 2.5 |
Mean y = 35 |
^{2}_{x}= 2.91 |
^{2}_{y}= 291.67 |
||
= 1.71 |
_{y}= 17.06 |
^{With the Distribution Table Completed we can calculate the correlation coefficient:}
r_{xy} = (xy) / N(_{xy}) and by substitution
r_{xy} = 175 / (6(1.71 * 17.06) = 175/175.62 = 0.99
For the data we collected, we have a correlation coefficient of r = 0.997, which is very strong.
To estimate the significance of the coefficient, just perform this F-test:
1. F = _{x }/ _{y}= 1.71 / 17.06 = 0.10 (this is the obtained F value).
2. Calculate degrees of freedom:
3. Locate the critical value of F by using the two degrees of freedom (click for F-Table).
At df 5:5 the Critical F is 5.05
Compare the Obtained F value to the Critical F value the Obtained F value is 0.10
So - r = 0.997 is strong and significant at the .05 level
because the obtained F is smaller than the critical F
Name:
Homework: Correlations
Suppose you have collected Aptitude test scores and productivity levels in graduate school. Your theory states that the higher the aptitude for scholarly study, the higher one's productivity measured by average length of term paper in pages.
Aptitude scores range from 8 (low) to 25 (high)
Grad. School Productivity ranges from 14 (low) to 43 (high)
Here is the raw data:
Aptitude X |
Productivity Y |
11 |
22 |
12 |
32 |
20 |
29 |
16 |
33 |
19 |
33 |
25 |
43 |
08 |
14 |
10 |
21 |
12 |
25 |
15 |
24 |
21 |
42 |
20 |
40 |