Lesson 5: Multiple Regression Variable Selection
September 29, 2025
Review:
- Intro to Multiple Regression
Presentation:
- Variable Selection Procedures
- Variable Selection
- Backward Elimination then Forward Selection
- Demonstrate with East Pueblo RE Sales 2014-2016
- Modeling Objective:
- Achieve highest R^2 possible
- For some data this may be below 0.50
- Don’t sacrifice too much significance for small gains in R^2
- Achieve high F-statistic
- Fewer variables = generally preferable (keep it simple)
- Significance of F (p-value) should be <0.05, lower is better
- Individual variables should have p-value < 0.05
- Occasionally it’s appropriate to make exceptions
- Find a good balance between F and R^2
- Achieve highest R^2 possible
Activity:
- Produce a model to estimate MLB winning percentage
- Produce a Final Model, aiming to maximize both R^2 and F-Statistic
- Compile 2025 MLB Team data and build a new model
- Here’s a head start with all MLB teams and 2025 Wins, Losses and Payroll
- Use Baseball Reference to add other variables (e.g., ERA, RBIs, etc)
- Produce a Final Model, aiming to maximize both R^2 and F-Statistic