MINITAB Exercise:  Cross-Validation

We will analyze the Albuquerque Home Prices data from DASL, building a model to predict Price.  We will be including an interaction term in our list of candidate regressors.  Variable selection and model fitting will be done with half of the data and the remaining half will be used for validation.

1.  Open the Albuquerque Home Prices data found in the the folder, R:/Minitab13/Regression, or by going to the course website, and copying and pasting the data into a Mintab worksheet.

2.  Create the interaction variable SQFTbyNE using Calc > Calculator.

3.  Follow the steps below to divide the data into two sets – one set for model building and one for validation.
 (a)  Create a variable Home which numbers the homes from 1 to 117:
  Calc > Make Patterned Data > Simple Set of Numbers
    From 1  To 117

(b)  Create an indicator variable that is 0 for odd homes and 1 for evens:
  Calc > Calculator
   Store result in:  Build
   Expression:  0.5 * (-1)**Home + 0.5

(c)  Put the Build data into one worksheet and the Validation data into another:
  Manip > Split Worksheet
   By variables:  Build
  (Build = 1 will be the model building data set, and Build = 0 will be the     validation data set.)
 

4.  Determine the two best models and fit them using the build data set.  Here's how:

(a)  Make sure the Build=1 worksheet is active.  Run all possible regressions and find the two models with lowest Cp.
  Stat > Regression > Best subsets
   Free predictors:  SQFT FEATS NE CUST COR TAX SQFTbyNE

(b)  Obtain the prediction equation / parameter estimates for the two "best" models.
  Stat > Regression > Regression

 Write down your  results

Model #1
Cp =
S =
Y-hat =

Model #2
Cp =
S =
Y-hat =

Now see how well these models work for the validation data.

5.  Use the prediction equations you found previously to predict prices for odd numbered homes.  Be sure to activate the Build = 0 worksheet by clicking on it.
(a)  Calc > Calculator
  Store Result in:  Yhat1
  Expression:  Type in your prediction equation for Model 1.
 (b) Calc > Calculator
  Store Result in:  Yhat2
  Expression:  Type in your prediction equation for Model 2.

6.  Obtain prediction errors.
(a)  Calc > Calculator
  Store result in:  Error1
  Expression:  Price-Yhat1
 (b) Calc > Calculator
  Store result in:  Error2
  Expression:  Price-Yhat2

7.  Summarize the prediction errors.
 Calc > Column Statistics > check the box for sum of squares
  Variable:  Error1

Repeat for Error2

Compare the sums of squares for Error1 and Error2 to each other.  Which model did a better job of prediction?
 

8. Use the sums of squares for the prediction errors you found in #7 to obtain a prediction-based R-squared using the two models with the validation data.  For each model, compare the prediction-based R-squared with the R-squared for the build data set.