Lesson 7: Categorical Data and Indicator Variables
October 13, 2025
Review:
- Multicollinearity
- Interaction Variables
Presentation:
- Pivot tables to aggregate by category
- Indicator Variables (aka “dummy” variables)
- incorporate categorical variables
- binary – value is either 1 or 0
- if 0, then the coefficient has no effect on y-hat
- if 1, then the coefficient is added to the estimate
- Demonstration with Sample Real Estate data
- Use a Pivot table to summarize price by neighborhood
- create a dummy variable for a neighborhood
- create a dummy variable for multiple neighborhoods
Activity:
- Use the Pueblo Real Estate model (from last lesson)
- Incorporate one or more Indicator variables
- Can you improve R^2 and/or F with an indicator variable?
Assignment:
- Download Used Cars data and utilize Excel (dataset is large enough to slow Sheets)
- Create a correlation matrix for all numeric variables.
- Do you see evidence of multicollinearity?
- If so, incorporate one or more interaction variables
- Incorporate one or more Indicator variables to include categorical data elements (e.g., Make, Model, Trim or Type)
- After experimenting with Interaction and Indicator variables, make a model with whatever variables you think work best, to estimate used vehicle sales prices.