Statistic of the Week

Correlation Coefficients and the F-Test

7400.685.080 Research Methods in HE/FE Inst: David D. Witt, Ph.D.

So far, we have examined:

UNIVARIATE (single variable) statistics (i.e., measures of central tendency and measures of dispersion).

and

BIVARIATE (2 variable) statistics (i.e, means comparisons & t-tests, variance comparisions & F-tests).

This week's statistic, the CORRELATION COEFFICIENT, allows a researcher to estimate a relationship between two variables in a more robust manner.

The correlation is designed to be used with more precise measures,
such as Ordinal and Interval Level Measures.

More precise measures allow for more precise arithmetic, so we can compute correlations for such measures as: - Education level (none, 1 year, 2 years, 3 years .... 25 years of formal education). -Annual Family Income in actual dollars.
-Life Satisfaction scales, and so on.

For example, it is our culture's Agreement Reality (that is, we all generally agree in principle) that Education is positively related to Income Level. That is to say (and this is important),

"As the number of years of education increase, the number of dollars accrued in a year increase".

To achieve Experiential Reality (actually have proof) on this relationship, we must measure the concepts and empirically express the relationship (i.e., statistically, numerically).

With correlations, we answer two kinds of questions:

Operationalization of Education and Income has been discussed already. Each variable is expressed on the Cartesian plane below (Education is plotted on the x-axis, Income is plotted on the y-axis).

The figure graphed is known as a scattergram. As Education goes up, so does Income!

(NOTE THE ELLIPTICAL NATURE OF THE SCORES AS THEY GATHER AROUND THE REGRESSION LINE). The narrower the ellipse, the more significant the correlation.


More Graphs - Here are four distinct types of correlation, illustrated graphically in scatterplots:

Computing the Product-Moment Correlation

Remember the ideal of the Distribution Table? Well, here it comes again.

The reason you need a distribution, or summary, table is because you will hand calculate sums of scores, means, sums of deviations from means, standard deviations, variance, and the product of the deviations.

First the Equations: Correlation coefficients are expressed using a lower case "r".

The correlation between X (Education) and Y (Income) is calculated using the formula:

rxy = (xy) / N(xy)

Where: r = correlation coefficient
x = (X - Mean of x)
y = (Y - Mean of y)
Remember that : x= x2/N and that y= y2/N

So, the Distribution Table for Correlations looks like this:

DISTRIBUTION TABLE FOR EDUCATION AND INCOME

Education

X

Income

Y

(X-Mean of X)

x

Y-mean of Y)

y

xy

x2

y2

0

10

-2.5

-25

62.5

6.25

625

1

20

-1.5

-15

22.5

2.25

225

2

30

-0.5

-05

2.5

0.25

25

3

40

0.5

05

2.5

0.25

25

4

50

1.5

15

22.5

2.25

225

5

60

2.5

25

62.5

6.25

625

X=15

Y=210

x=0

y=0

xy=175

x2=17.5

y2=1750

N=6

Mean x = 2.5

Mean y = 35



2x= 2.91

2y= 291.67






= 1.71

y= 17.06

With the Distribution Table Completed we can calculate the correlation coefficient:

rxy = (xy) / N(xy) and by substitution

rxy = 175 / (6(1.71 * 17.06) = 175/175.62 = 0.99

For the data we collected, we have a correlation coefficient of r = 0.997, which is very strong.

To estimate the significance of the coefficient, just perform this F-test:

1. F = x / y= 1.71 / 17.06 = 0.10 (this is the obtained F value).

2. Calculate degrees of freedom:

3. Locate the critical value of F by using the two degrees of freedom (click for F-Table).

At df 5:5 the Critical F is 5.05

Compare the Obtained F value to the Critical F value the Obtained F value is 0.10

So - r = 0.997 is strong and significant at the .05 level
because the obtained F is smaller than the critical F


Name:

Homework: Correlations

Suppose you have collected Aptitude test scores and productivity levels in graduate school. Your theory states that the higher the aptitude for scholarly study, the higher one's productivity measured by average length of term paper in pages.

Aptitude scores range from 8 (low) to 25 (high)
Grad. School Productivity ranges from 14 (low) to 43 (high)

  1. Calculate the correlation (coefficient and significance level) between these two measures.
  2. Is there a relationship between the two?
  3. Draw a scattergram (plot each pair of scores)

Here is the raw data:

Aptitude

X

Productivity

Y

11

22

12

32

20

29

16

33

19

33

25

43

08

14

10

21

12

25

15

24

21

42

20

40