Chapter 9

Acknowledgement

These notes use content from OpenIntro Statistics Slides by

Mine Cetinkaya-Rundel.

Model selection

Beauty in the classroom

Data: Student evaluations of instructors’ beauty and teaching quality for 463 courses at the University of Texas.
Evaluations conducted at the end of semester, and the beauty judgements were made later, by six students who had not attended the classes and were not aware of the course evaluations (2 upper level females, 2 upper level males, one lower level female, one lower level male).

Professor rating vs. beauty

Professor evaluation score (higher score means better) vs. beauty score (a score of 0 means average, negative score means below average, and a positive score above average):

Which of the below is \(\underline{correct}\) based on the model output?

\[ \begin{eqnarray*} \hline & Estimate & Std. Error & t value & Pr(>|t|) \\ \hline (Intercept) & 4.19 & 0.03 & 167.24 & 0.00 \\ beauty & 0.13 & 0.03 & 4.00 & 0.00 \\ \hline R^2 = 0.0336 \end{eqnarray*} \]

Model predicts 3.36% of professor ratings correctly.
Beauty is not a significant predictor of professor evaluation.
Professors who score 1 point above average in their beauty score are tend to also score 0.13 points higher in their evaluation.
3.36% of variability in beauty scores can be explained by professor evaluation.
The correlation coefficient could be \(\sqrt{0.0336} = 0.18\) or \(-0.18\), we can’t tell which is correct.

Which of the below is \(\underline{correct}\) based on the model output?

Model predicts 3.36% of professor ratings correctly.
Beauty is not a significant predictor of professor evaluation.
Professors who score 1 point above average in their beauty score are tend to also score 0.13 points higher in their evaluation.
3.36% of variability in beauty scores can be explained by professor evaluation.
The correlation coefficient could be \(\sqrt{0.0336} = 0.18\) or \(-0.18\), we can’t tell which is correct.

Exploratory analysis

\(\color{red}{\text{Any interesting features?}}\)

Few females with very low beauty scores.

For a given beauty score, are male professors evaluated higher, lower, or about the same as female professors?

Difficult to tell from this plot only.

Professor rating vs. beauty + gender

For a given beauty score, are male professors evaluated higher, lower, or about the same as female professors?

\[ \begin{eqnarray*} \hline & Estimate & Std. Error & t value & Pr(>|t|) \\ \hline (Intercept) & 4.09 & 0.04 & 107.85 & 0.00 \\ beauty & 0.14 & 0.03 & 4.44 & 0.00 \\ gender.male & 0.17 & 0.05 & 3.38 & 0.00 \\ \hline R^2_{adj} = 0.057 \end{eqnarray*} \]

higher
lower
about the same

Professor rating vs. beauty + gender

For a given beauty score, are male professors evaluated higher, lower, or about the same as female professors?

higher\(\rightarrow\) Beauty held constant, male professors are rated 0.17 points higher on average than female professors.
lower
about the same

Full model

\[ \begin{eqnarray*} \hline & Estimate & Std. Error & t value & Pr(>|t|) \\ \hline (Intercept) & 4.6282 & 0.1720 & 26.90 & 0.00 \\ beauty & 0.1080 & 0.0329 & 3.28 & 0.00 \\ gender.male & 0.2040 & 0.0528 & 3.87 & 0.00 \\ age & -0.0089 & 0.0032 & -2.75 & 0.01 \\ formal.yes \tt{formal}: picture wearing tie\&jacket/blouse, levels: \tt{yes}, \tt{no} & 0.1511 & 0.0749 & 2.02 & 0.04 \\ lower.yes \tt{lower}: lower division course, levels: \tt{yes}, \tt{no} & 0.0582 & 0.0553 & 1.05 & 0.29 \\ native.non english & -0.2158 & 0.1147 & -1.88 & 0.06 \\ minority.yes & -0.0707 & 0.0763 & -0.93 & 0.35 \\ students \tt{students}: number of students & -0.0004 & 0.0004 & -1.03 & 0.30 \\ tenure.tenure track \tt{tenure}: tenure status, levels: \tt{non-tenure track}, \tt{tenure track}, \tt{tenured} & -0.1933 & 0.0847 & -2.28 & 0.02 \\ tenure.tenured & -0.1574 & 0.0656 & -2.40 & 0.02 \\ \hline \end{eqnarray*} \]

Hypotheses

Just as the interpretation of the slope parameters take into account all other variables in the model, the hypotheses for testing for significance of a predictor also takes into account all other variables.

[\(H_0:\)] \(B_i = 0\) when other explanatory variables are included in the model.
[\(H_A:\)] \(B_i \ne 0\) when other explanatory variables are included in the model.

Assessing significance: numerical variables

The p-value for age is 0.01. What does this indicate?

\[ \begin{eqnarray*} \hline & Estimate & Std. Error & t value & Pr(>|t|) \\ \hline ...\\ age & -0.0089 & 0.0032 & -2.75 & 0.01 \\ ...\\ \hline \end{eqnarray*} \] - Since p-value is positive, higher the professor’s age, the higher we would expect them to be rated.

If we keep all other variables in the model, there is strong evidence that professor’s age is associated with their rating.
Probability that the true slope parameter for age is 0 is 0.01.
There is about 1% chance that the true slope parameter for age is -0.0089.

Assessing significance: numerical variables

The p-value for age is 0.01. What does this indicate?

\[ \begin{eqnarray*} \hline & Estimate & Std. Error & t value & Pr(>|t|) \\ \hline ...\\ age & -0.0089 & 0.0032 & -2.75 & 0.01 \\ ...\\ \hline \end{eqnarray*} \]

Since p-value is positive, higher the professor’s age, the higher we would expect them to be rated.
If we keep all other variables in the model, there is strong evidence that professor’s age is associated with their rating.

- Probability that the true slope parameter for age is 0 is 0.01.
- There is about 1% chance that the true slope parameter for age is -0.0089.

Assessing significance: categorical variables

Tenure is a categorical variable with 3 levels: non tenure track, tenure track, tenured. Based on the model output given, which of the below is \(\underline{false}\)?

\[ \begin{eqnarray*} \hline & Estimate & Std. Error & t value & Pr(>|t|) \\ \hline ... \\ tenure.tenure track & -0.1933 & 0.0847 & -2.28 & 0.02 \\ tenure.tenured & -0.1574 & 0.0656 & -2.40 & 0.02 \\ \hline \end{eqnarray*} \]

Reference level is non tenure track.
All else being equal, tenure track professors are rated, on average, 0.19 points lower than non-tenure track professors.
All else being equal, tenured professors are rated, on average, 0.16 points lower than non-tenure track professors.
All else being equal, there is a significant difference between the average ratings of tenure track and tenured professors.

Assessing significance: categorical variables

Tenure is a categorical variable with 3 levels: non tenure track, tenure track, tenured. Based on the model output given, which of the below is \(\underline{false}\)?

Reference level is non tenure track.
All else being equal, tenure track professors are rated, on average, 0.19 points lower than non-tenure track professors.
All else being equal, tenured professors are rated, on average, 0.16 points lower than non-tenure track professors.
All else being equal, there is a significant difference between the average ratings of tenure track and tenured professors.

Assessing significance

Which predictors do not seem to meaningfully contribute to the model, i.e. may not be significant predictors of professor’s rating score?

\[ \begin{eqnarray*} \hline & Estimate & Std. Error & t value & Pr(>|t|) \\ \hline (Intercept) & 4.6282 & 0.1720 & 26.90 & 0.00 \\ beauty & 0.1080 & 0.0329 & 3.28 & 0.00 \\ gender.male & 0.2040 & 0.0528 & 3.87 & 0.00 \\ age & -0.0089 & 0.0032 & -2.75 & 0.01 \\ formal.yes & 0.1511 & 0.0749 & 2.02 & 0.04 \\ \hline \color{red}{lower.yes} & \color{red}{0.0582} & \color{red}{0.0553} & \color{red}{1.05} & \color{red}{0.29} \\ \color{red}{native.non english} & \color{red}{-0.2158} & \color{red}{0.1147} & \color{red}{-1.88} & \color{red}{0.06} \\ \color{red}{minority.yes} & \color{red}{-0.0707} & \color{red}{0.0763} & \color{red}{-0.93} & \color{red}{0.35} \\ \color{red}{students} & \color{red}{-0.0004} & \color{red}{0.0004} & \color{red}{-1.03} & \color{red}{0.30} \\ \hline tenure.tenure track & -0.1933 & 0.0847 & -2.28 & 0.02 \\ tenure.tenured & -0.1574 & 0.0656 & -2.40 & 0.02 \\ \hline \end{eqnarray*} \]

Model selection strategies

Based on what we’ve learned so far, what are some ways you can think of that can be used to determine which variables to keep in the model and which to leave out?

Backward-elimination

Start with the full model
Drop one variable at a time and record \(R^2_{adj}\) of each smaller model
Pick the model with the highest increase in \(R^2_{adj}\)
Repeat until none of the models yield an increase in \(R^2_{adj}\)

Backward-elimination

\[ \begin{eqnarray*} &gender + age + formal + lower + native + minority + students + tenure & \color{red}{0.0839} \\ \hline Step 1 & gender + age + formal + lower + native + minority + students + tenure & 0.0642 \\ & beauty + age + formal + lower + native + minority + students + tenure & 0.0557 \\ & beauty + gender + formal + lower + native + minority + students + tenure & 0.0706 \\ & beauty + gender + age + lower + native + minority + students + tenure & 0.0777 \\ & beauty + gender + age + formal + native + minority + students + tenure & 0.0837 \\ & beauty + gender + age + formal + lower + minority + students + tenure & 0.0788 \\ & beauty + gender + age + formal + lower + native + students + tenure & \color{red}{0.0842} \\ & beauty + gender + age + formal + lower + native + minority + tenure & 0.0838 \\ & beauty + gender + age + formal + lower + native + minority + students & 0.0733 \\ \hline \end{eqnarray*} \]

Backward-elimination

\[ \begin{eqnarray*} \hline Step 2 & gender + age + formal + lower + native + students + tenure & 0.0647 \\ & beauty + age + formal + lower + native + students + tenure & 0.0543 \\ & beauty + gender + formal + lower + native + students + tenure & 0.0708 \\ & beauty + gender + age + lower + native + students + tenure &0.0776 \\ & beauty + gender + age + formal + native + students + tenure & \color {red}{0.0846} \\ & beauty + gender + age + formal + lower + native + tenure & 0.0844 \\ & beauty + gender + age + formal + lower + native + students & 0.0725 \\ \hline \end{eqnarray*} \]

Backward-elimination

\[ \begin{eqnarray*} \hline Step 3 & gender + age + formal + native + students + tenure & 0.0653 \\ & beauty + age + formal + native + students + tenure & 0.0534 \\ & beauty + gender + formal + native + students + tenure & 0.0707 \\ & beauty + gender + age + native + students + tenure & 0.0786 \\ & beauty + gender + age + formal + students + tenure & 0.0756 \\ & beauty + gender + age + formal + native + tenure & \color{red}{0.0855} \\ & beauty + gender + age + formal + native + students & 0.0713 \\ \hline Step 4 & gender + age + formal + native + tenure & 0.0667 \\ & beauty + age + formal + native + tenure & 0.0553 \\ & beauty + gender + formal + native + tenure & 0.0723 \\ & beauty + gender + age + native + tenure & 0.0806 \\ & beauty + gender + age + formal + tenure & 0.0773 \\ & beauty + gender + age + formal + native & 0.0713 \\ \end{eqnarray*} \]

\(\tt{step}\) function in R

Best model: beauty + gender + age + formal + native + tenure

## 
## Call:
## lm(formula = profevaluation ~ beauty + gender + age + formal + 
##     lower + native + minority + students + tenure, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.79845 -0.37270  0.09849  0.39052  0.93273 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         4.6282155  0.1720227  26.905  < 2e-16 ***
## beauty              0.1079530  0.0329357   3.278 0.001127 ** 
## gendermale          0.2040127  0.0527509   3.867 0.000126 ***
## age                -0.0089405  0.0032458  -2.755 0.006115 ** 
## formalyes           0.1511348  0.0749453   2.017 0.044328 *  
## loweryes            0.0581603  0.0553270   1.051 0.293723    
## nativenon english  -0.2157998  0.1146764  -1.882 0.060503 .  
## minorityyes        -0.0706677  0.0762621  -0.927 0.354607    
## students           -0.0003726  0.0003603  -1.034 0.301536    
## tenuretenure track -0.1932547  0.0846549  -2.283 0.022903 *  
## tenuretenured      -0.1574315  0.0655919  -2.400 0.016791 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5206 on 452 degrees of freedom
## Multiple R-squared:  0.1037, Adjusted R-squared:  0.0839 
## F-statistic: 5.231 on 10 and 452 DF,  p-value: 2.748e-07

Forward selection

Start with regressions of response vs. each explanatory variable
Pick the model with the highest \(R^2_{adj}\)
Add the remaining variables one at a time to the existing model, and once again pick the model with the highest \(R^2_{adj}\)
Repeat until the addition of any of the remaining variables does not result in a higher \(R^2_{adj}\)

Backward elimination with the p-value approach:
- Start with the full model
- Drop the variable with the highest p-value and refit a smaller model
- Repeat until all variables left in the model are significant
Forward selection with the p-value approach:
- Start with regressions of response vs. each explanatory variable
- Pick the variable with the lowest significant p-value
- Add the remaining variables one at a time to the existing model, and pick the variable with the lowest significant p-value
- Repeat until any of the remaining variables does not have a significant p-value

Backward-elimination: \(p-value\) approach

\[ \begin{eqnarray*} \hline Full & beauty & gender & age & formal & lower & native & minority & students & tenure & tenure \\ & & male & & yes & yes & nonenglish & yes & & tenure track & tenured \\ & 0.00 & 0.00 & 0.01 & 0.04 & 0.29 & 0.06 & \color{red}{0.35} & 0.30 & 0.02 & 0.02 \\ \hline Step 1 & beauty & gender & age & formal & lower & native & & students & tenure & tenure \\ & & male & & yes & yes & nonenglish & & & tenure track & tenured \\ & 0.00 & 0.00 & 0.01 & 0.04 & \color{red}{0.38} & 0.03 & & 0.34 & 0.02 & 0.01 \\ \hline Step 2 & beauty & gender & age & formal & & native & & students & tenure & tenure \\ & & male & & yes & & nonenglish & & & tenure track & tenured \\ & 0.00 & 0.00 & 0.01 & 0.05 & & 0.02 & & \color{red}{0.44} & 0.01 & 0.01\\ \hline Step 3 & beauty & gender & age & formal & & native & & & tenure & tenure \\ & & male & & yes & & nonenglish & & & tenure track & tenured \\ & 0.00 & 0.00 & 0.01 & \color{red}{0.06} & & 0.02 & & & 0.01 & 0.01 \\ \hline Step 4 & beauty & gender & age & & & native & & & tenure & tenure \\ & & male & & & & nonenglish & & & tenure track & tenured \\ & 0.00 & 0.00 & 0.01 & & & \color{red}{0.06} & & & 0.01 & 0.01 \\ \hline Step 5 & beauty & gender & age & & & & & & tenure & tenure \\ & & male & & & & & & & tenure track & tenured \\ & 0.00 & 0.00 & 0.01 & & & & & & 0.01 & 0.01 \\ \end{eqnarray*} \]

Backward-elimination: \(p-value\) approach

Adjusted \(R^2\) vs. p-value approaches

The two approaches are similar, but they sometimes lead to different models, with the adjusted \(R^2\) approach tending to include more predictors in the final model.
When the sole goal is to improve prediction accuracy, use \(R^2\) . This is commonly the case in machine learning applications.
When we care about understanding which variables are statistically significant predictors of the response, or if there is interest in producing a simpler model at the potential cost of a little prediction accuracy, then the p-value approach is preferred.
Regardless of the approach we use, our job is not done after variable selection – we must still verify the model conditions are reasonable.