7400.685-080 - Research Methods in FCS
School of Family and Consumer Sciences
Spring Semesters - Tuesday Evenings 5:20-7:55pm in 209 Schrank Hall South
Instructor: David D. Witt, Ph.D.
Secondary Analysis of Quantitative Data Sets

    The straightforward, start-to-finish, normally conceived pathway to doing primary research is to begin with the top of Wallace's Wheel and move around back to theory. This requires some type of data collection, usually at great expense to the researcher or some funding agent.  Because proper data collection is expensive in money, time and effort, researchers will often attempt to make their datasets more useful through careful planning.   Anytime a researcher revisits a dataset with solutions to new research problems in mind, that researcher is engaging in secondary research.

Using Secondary Analysis means approaching the research project a little bit backwards, or sort of sideways.
Experience working with large datasets in general and particular datasets ... specifically allows the research to acquire dataset knowledge as a set of skills - like Piaget's idea of schema.   You know where you want to go in a general directional way, and you expect to find help (questions that will serve as measures of concepts coming from your literature review) along the way.

A story - I first had this experience in graduate school, working with census data files and the General Social Surveys, among others. I would be working for the Texas State Data Center in the afternoons, the Secondary Analysis Research Center in the late evenings, going to class usually in the early evenings, and studying in the early mornings.   Everything  started mushing together - work-class-study-work had me thinking about my discipline most of the time. I even began thinking about potential research ideas while installing carpet and tile at my day job..

One day, while listening to Prof. Lowe lecture on the mental health studies of the Yoruba tribes (the one that ultimately led to the finding that 1 out of every 3 people you meet is mentally impaired enough to warrant therapy), I realized the study he was talking about could be replicated using the GSS. As he talked about the variables used in that study, I ran through the GSS Codebook in my mind and started charting out the replication study.  At the break I showed Prof. Lowe the idea and he was, for the first time since I met him, speechless.

There are many data sets available to researchers who are associated with universities. As a member of the Inter-University Consortium for Political and Social Research, students at the University of Akron have available to them thousands of data sets that were collected using federal and state grant funds. Acquisition of codebooks and data sets is easy through our librarian Prof. LaRose.

One of the most useful, and often used, sources for secodary research are the General Social Surveys:
GSS Study Description as stated by the National Opinion Research Center
This study, begun in 1972, was supported in its first year by grants from the Russell Sage Foundation and the National Science Foundation. NSF provided support for 1973 through 1991, with surveys in 1973-1978, 1980, 1982, 1983-1993, 1994, 1996, 1998, 2000, 2002, and 2004.
The National Data Program for the Social Sciences (General Social Survey) is both a data diffusion project and a program of social indicator research. Its data collection instrument, the General Social Survey (GSS), was fielded for the 24th time in 2002. Previously an annual survey, the GSS became biennial in 1994. The questionnaire contains a standard core of demographic and attitudinal variables, plus certain topics of special interest selected for rotation (called "topical modules"). Items that appeared on national surveys between 1973 and 1975 are replicated. The exact wording of these questions is retained to facilitate time trend studies as well as replications of earlier findings.
Items include national spending priorities, drinking behavior, marijuana use, crime and punishment, race relations, quality of life, confidence in institutions, and membership in voluntary associations. Since 1985, the GSS has taken part in the International Social Survey Program, a consortium of social scientists from 39 countries around the world. The ISSP asks an identical battery of questions in all countries; the U.S. version of these questions is incorporated into the GSS. GSS Principal Investigator Tom W. Smith served as the ISSP Secretary General of 1997-2003.
The basic purposes of the GSS are to gather data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes; to examine the structure and functioning of society in general as well as the role played by relevant subgroups; to compare the United States to other societies in order to place American society in comparative perspective and develop cross-national models of human society; and to make high-quality data easily accessible to scholars, students, policy makers, and others, with minimal cost and waiting. Since 1988, the GSS has also collected data on number of sex partners, frequency of intercourse, extramarital relationships, and sex with prostitutes.
The GSS is the largest sociology project funded by NSF and has been described as a national resource. In use by sociologists it is second only to the Census.

These truly national level, stratified randomly sampled data sets enable researchers to tap the responses of about 1500 respondents every two years from 1972 to 2004.  With many questions asked each outing, especially demographic background questions, researchers are able to identify specific subgroups for comparison on a wide array of topic variables. We'll be working with the GSS in a moment.

Some examples of master's theses using the General Social Surveys:
Christine Fruhauf (2002) - Methods and Findings Chapters  The Relationship of Age to Subjective Well-Being
        Note Christine's use of tables to organize the Review of Literature  and Christine's complete thesis here
Katherine Peace (1991) - Methods and Findings Chapters - A Statistical Profile of Adult Child of Divorce
Michael Lombardo (1999) - Methods and Findings Chapters - The Antecedents and Consequences of Cohabitation.

Some useful links if you decide to investigate these datasets:

Now let's turn to an actual exercise in Secondary Analysis using the General Social Surveys.
Remember that a variable here is a questionnaire item asked of a sample of respondents. Among the variables available on the GSS are Age, Education (measured several ways), income (measured several ways), satisfaction measures (with neighborhood, income, social class, gender, occupational prestige - a total of 1410 questions asked of some or all respondents). So we can start thinking about doing research with the GSS always in the back of our minds, ready to be called upon to provide data for us.

Suppose we were attempting to answer the research question, "In terms of attainment of social status, is college worth it?" because it was suggested after a review of literature.  We decide the literature is telling us that:
    -people go to college to get a job that requires a college degree.
    -over the decades, the need for a college degree to get a good job has increased.
    -people who have a college degree appear to be more satisfied with their income than those without a degree.

Now we can test some research questions out using the data in the GSS.
    -the education variable "What is the highest degree you have earned?" with response categories:
  • Less than high school - high school - junior college - bachelor - graduate  - don't know - missing data 
    -the status attainment questions:  
  • occupational prestige - all occupations are coded and ranked from lowest to highest prestige, which is a proxy for a measure of increasing need for a college degree.
  • income of respondent - all income levels are coded into categories
We'll want to run a frequency count to generate a look at our variables, and to get a feel for the means and standard deviations - See the frequency table here.

We would want to compare, both in statistical terms and practical terms, the differences between the non-college and college groups using a t-test.  See the t-test results here.

By collapsing the income and occupational prestige responses into five categories by recoding them electronically, we could run a chi square test. First we'd want to see the frequency counts of our recodes  (see that here) then run the chi square tests we want. See those here - these are called crosstabulations, or crosstabs. We'll add the life satisfaction questions jsut for fun.
So what'd we learn today?
With the computing power available on most desktops or laps?? researchers are able to re-examine previously collected data sets.   Some of the best research today is Secondary Analysis Research.