Lesson 10: Cautions about Correlation and Regression
October 1, 2018
Review:
 Pearson Correlation Coefficient
 RSquared
Presentation:
 Cautions about Correlation and Regression
 Influential Observations
 Outliers
 Diabetes and blood sugar (p. 129130)
 Lurking Variables
 Explanatory variables not included
 High School math and success in college (p. 132133)
 The Question of Causation
 Correlation does not imply causation
 Spurious Correlations
 Retrospective study = looking back to find possible causes for an established outcome among a sample population
 Prospective studies = following a sample population over time and studying behaviors possibly linked to likelihood of an outcome
 Video
 Extrapolation
 Use of regression for prediction far outside the range of the explanatory variable
 Predictions outside the range are unreliable
 Further outside the range = less reliable predictions
 Sensitivity Analysis Demonstration
 Remove influential observations and recalculate regression equation and RSquared
 Use Pueblo Voter Turnout data

Midterm Election Time Period Ballots Cast 2002 1 49,304 2006 2 52,957 2010 3 54,471 2014 4 60,543  Produce a scatter plot with Time Period on the x axis and Ballots Cast on the y axis
 Find the linear regression equation and the corresponding RSquared value
 Remove the 2002 data
 Recalculate the linear regression equation and the corresponding RSquared value
 Remove the 2014 data
 Recalculate the linear regression equation and the corresponding RSquared value
 Influential Observations
Activity:
 CSUPueblo finished 5th in a recent Women’s golf tournament with individual results displayed below.
Position Player (seed) Round 1 Round 2 Total T8 Jolene Kam (3) 72 73 145 T23 Courtney Ewing (1) 75 75 150 T34 Rachanok Rahulpan (4) 74 78 152 T45 Orakorn Thirayatorn (2) 78 77 155 T72 Amy Sutheran (5) 78 83 161  Produce a scatter plot with Round 1 scores on the x axis and Round 2 scores on the y axis
 Find the linear regression equation and the corresponding RSquared value
 Identify the most influential observation
 Omit the observation you think is most influential
 Recalculate the linear regression equation and the corresponding RSquared value
Assignment:
 Use Sheets and results from a recent Women’s golf tournament
 Go to the “Team Results” worksheet
 Plot the data with Round 1 scores on the x axis and Round 2 scores on the y axis
 Find the linear regression equation and the corresponding RSquared value
 Identify the most influential observation(s)
 Omit the observation(s) you think are most influential
 Recalculate the linear regression equation and the corresponding RSquared value
 Go to the “Individual Results” worksheet
 Plot the data with Round 1 scores on the x axis and Round 2 scores on the y axis
 Find the linear regression equation and the corresponding RSquared value
 Identify the most influential observation(s)
 Omit the observation(s) you think are most influential
 Recalculate the linear regression equation and the corresponding RSquared value
