We will analyze the Albuquerque Home Prices data from DASL, building a model to predict Price. We will be including an interaction term in our list of candidate regressors. Variable selection and model fitting will be done with half of the data and the remaining half will be used for validation.
1. Open the Albuquerque Home Prices data found in the the folder, R:/Minitab13/Regression, or by going to the course website, and copying and pasting the data into a Mintab worksheet.
2. Create the interaction variable SQFTbyNE using Calc > Calculator.
3. Follow the steps below to divide the data into two sets – one
set for model building and one for validation.
(a) Create a variable Home which numbers the homes from
1 to 117:
Calc > Make Patterned Data > Simple Set of Numbers
From 1 To 117
(b) Create an indicator variable that is 0 for odd homes and 1
for evens:
Calc > Calculator
Store result in: Build
Expression: 0.5 * (-1)**Home + 0.5
(c) Put the Build data into one worksheet and the Validation data
into another:
Manip > Split Worksheet
By variables: Build
(Build = 1 will be the model building data set, and Build =
0 will be the validation data set.)
4. Determine the two best models and fit them using the build data set. Here's how:
(a) Make sure the Build=1 worksheet is active. Run all possible
regressions and find the two models with lowest Cp.
Stat > Regression > Best subsets
Free predictors: SQFT FEATS NE CUST COR TAX SQFTbyNE
(b) Obtain the prediction equation / parameter estimates for the
two "best" models.
Stat > Regression > Regression
Write down your results
Model #1
Cp =
S =
Y-hat =
Model #2
Cp =
S =
Y-hat =
Now see how well these models work for the validation data.
5. Use the prediction equations you found previously to predict
prices for odd numbered homes. Be sure to activate the Build = 0
worksheet by clicking on it.
(a) Calc > Calculator
Store Result in: Yhat1
Expression: Type in your prediction equation for Model
1.
(b) Calc > Calculator
Store Result in: Yhat2
Expression: Type in your prediction equation for Model
2.
6. Obtain prediction errors.
(a) Calc > Calculator
Store result in: Error1
Expression: Price-Yhat1
(b) Calc > Calculator
Store result in: Error2
Expression: Price-Yhat2
7. Summarize the prediction errors.
Calc > Column Statistics > check the box for sum of squares
Variable: Error1
Repeat for Error2
Compare the sums of squares for Error1 and Error2 to each other.
Which model did a better job of prediction?
8. Use the sums of squares for the prediction errors you found in #7
to obtain a prediction-based R-squared using the two models with the validation
data. For each model, compare the prediction-based R-squared with
the R-squared for the build data set.