Lesson 6: Intro to Multiple Regression

February 20, 2024

Review:

Exam 1

Presentation:

Intro to Multiple Regression
- y-hat = B0 + B1*x1 + B2*x2 + …. Bn*xn
- Multiple Regression with XL Miner
- Or in Excel
Regression Output Interpretation
- t-statistics
- p-values
- R² measures “goodness of fit” (explains % of variance)
- p-value measures “significance” (probability of result by chance)
- rank by significance
Variable Selection Procedures
- Demonstrate with East Pueblo RE Sales 2014-2016
Modeling Objective:
- Achieve highest R^2 possible
  - Get the highest (closest to 1) as possible without over-fitting
  - For some data the best possible R^2 may seem low
  - Adding independent variables will always increase R^2, but isn’t always helpful
  - Don’t sacrifice statistical significance (measured by F) for small gains in R^2
- Achieve high a F-statistic
  - Fewer variables = generally preferable (keep it simple)
  - Significance of F (p-value) should be <0.05, lower is better
  - Individual variables should have p-value < 0.05, lower is better
  - Occasionally it’s appropriate to make exceptions
- Find a good balance between highly significant F and strong fitting R^2

Activity:

Produce a model to estimate winning percentage for Major League Baseball (MLB)
- Use 2013 MLB Team statistics
- Evaluate each variable using t-statistics with their associated p-value
Produce a Final Model, aiming to maximize and balance both R^2 and F-Statistic

Assignment:

Repeat the MLB modeling activity in Microsoft Excel, it’s good to be familiar with both
Watching this video tutorial may be helpful

Justin

Justin Holman is CEO of Aftermarket Analytics, where he leads efforts to develop cutting edge sales forecasting and inventory optimization technology for the Automotive Aftermarket. Prior to joining Aftermarket Analytics, Justin managed corporate consulting for the Strategy & Analytics division at MapInfo Corporation, leading major projects for retail clients including The Home Depot, Darden Restaurants, Bridgestone-Firestone, Sainsbury’s and New York & Company. Before that, Justin served as Vice President of Software Development at LogicTools, now part of IBM's supply chain application software group. Justin holds a B.A. from Claremont McKenna College, a Ph.D. from the University of Oregon and an Executive Management certificate from Northwestern University's Kellogg School of Management.

Lesson 6: Intro to Multiple Regression

Justin

Leave a Reply Cancel reply