Lesson 9: Multiple Regression Variable Selection Procedures
February 14, 2018
Review:
- Measuring Forecast Error
- Exam 1 Remodel
Presentation:
- Variable Selection Procedures
- Automated Selection
- Forward Selection
- Backward Selection
- Stepwise Selection
- Backward then Forward Selection
- Modeling Objective:
- Achieve highest R^2 possible
- For some data this may be below 0.50
- Don’t sacrifice too much significance for small gains in R^2
- Achieve high F-statistic
- Fewer variables = generally preferable (keep it simple)
- Significance of F (p-value) should be <0.05, lower is better
- Individual variables should have p-value < 0.05
- Occasionally it’s appropriate to make exceptions
- Achieve highest R^2 possible
Assignment:
- Produce a model to estimate MLB winning percentage
- Use 2013 MLB Team statistics
- Test all variables using one or more of the selection procedures above
- Produce a Final Model, aiming to maximize R^2 and F-Statistic
- Submit 1-page model summary in class on Monday, Feb 19