Lesson 10: Cautions about Correlation and Regression
October 1, 2018
Review:
- Pearson Correlation Coefficient
- R-Squared
- Exam 2 on Mon, Oct 8
- Sheets assignments due this Friday (10/5) by 5pm
Presentation:
- Cautions about Correlation and Regression
- Influential Observations
- Outliers
- Diabetes and blood sugar (p. 129-130)
- Lurking Variables
- Explanatory variables not included
- High School math and success in college (p. 132-133)
- The Question of Causation
- Correlation does not imply causation
- Spurious Correlations
- Retrospective study = looking back to find possible causes for an established outcome among a sample population
- Prospective studies = following a sample population over time and studying behaviors possibly linked to likelihood of an outcome
- Video
- Extrapolation
- Use of regression for prediction far outside the range of the explanatory variable
- Predictions outside the range are unreliable
- Further outside the range = less reliable predictions
- Sensitivity Analysis Demonstration
- Remove influential observations and recalculate regression equation and R-Squared
- Use Pueblo Voter Turnout data
-
Midterm Election Time Period Ballots Cast 2002 1 49,304 2006 2 52,957 2010 3 54,471 2014 4 60,543 - Produce a scatter plot with Time Period on the x axis and Ballots Cast on the y axis
- Find the linear regression equation and the corresponding R-Squared value
- Remove the 2002 data
- Recalculate the linear regression equation and the corresponding R-Squared value
- Remove the 2014 data
- Recalculate the linear regression equation and the corresponding R-Squared value
- Influential Observations
Activity:
- CSU-Pueblo finished 5th in a recent Women’s golf tournament with individual results displayed below.
Position Player (seed) Round 1 Round 2 Total T8 Jolene Kam (3) 72 73 145 T23 Courtney Ewing (1) 75 75 150 T34 Rachanok Rahulpan (4) 74 78 152 T45 Orakorn Thirayatorn (2) 78 77 155 T72 Amy Sutheran (5) 78 83 161 - Produce a scatter plot with Round 1 scores on the x axis and Round 2 scores on the y axis
- Find the linear regression equation and the corresponding R-Squared value
- Identify the most influential observation
- Omit the observation you think is most influential
- Recalculate the linear regression equation and the corresponding R-Squared value
Assignment:
- Use Sheets and results from a recent Women’s golf tournament
- Go to the “Team Results” worksheet
- Plot the data with Round 1 scores on the x axis and Round 2 scores on the y axis
- Find the linear regression equation and the corresponding R-Squared value
- Identify the most influential observation(s)
- Omit the observation(s) you think are most influential
- Recalculate the linear regression equation and the corresponding R-Squared value
- Go to the “Individual Results” worksheet
- Plot the data with Round 1 scores on the x axis and Round 2 scores on the y axis
- Find the linear regression equation and the corresponding R-Squared value
- Identify the most influential observation(s)
- Omit the observation(s) you think are most influential
- Recalculate the linear regression equation and the corresponding R-Squared value
- Go to the “Team Results” worksheet