Lesson 6: Intro to Multiple Regression
February 20, 2024
Review:
- Exam 1
Presentation:
- Intro to Multiple Regression
- y-hat = B0 + B1*x1 + B2*x2 + …. Bn*xn
- Multiple Regression with XL Miner
- Or in Excel
- Regression Output Interpretation
- t-statistics
- p-values
- R² measures “goodness of fit” (explains % of variance)
- p-value measures “significance” (probability of result by chance)
- rank by significance
- Variable Selection Procedures
- Demonstrate with East Pueblo RE Sales 2014-2016
- Modeling Objective:
- Achieve highest R^2 possible
- Get the highest (closest to 1) as possible without over-fitting
- For some data the best possible R^2 may seem low
- Adding independent variables will always increase R^2, but isn’t always helpful
- Don’t sacrifice statistical significance (measured by F) for small gains in R^2
- Achieve high a F-statistic
- Fewer variables = generally preferable (keep it simple)
- Significance of F (p-value) should be <0.05, lower is better
- Individual variables should have p-value < 0.05, lower is better
- Occasionally it’s appropriate to make exceptions
- Find a good balance between highly significant F and strong fitting R^2
- Achieve highest R^2 possible
Activity:
- Produce a model to estimate winning percentage for Major League Baseball (MLB)
- Evaluate each variable using t-statistics with their associated p-value
- Produce a Final Model, aiming to maximize and balance both R^2 and F-Statistic
Assignment:
- Repeat the MLB modeling activity in Microsoft Excel, it’s good to be familiar with both
- Watching this video tutorial may be helpful