Lesson 18: Cautions about Correlation and Regression
October 31, 2024
Review:
 Tuesday activity data
 Pearson Correlation Coefficient
 RSquared
 Schedule
 Week of Nov 4 – Residuals (and maybe some election data)
 Week of Nov 11 – Review and Exam 3 (Thu Nov 14)
 Week of Nov 18 – Return Exam 3, portfolio check and Make Up Exam (Thu Nov 21)
 Week of Nov 25 – Thanksgiving
 Week of Dec 2 – Review for Final Exam (Tue Dec 10 at 10:30)
Presentation:
 Cautions about Correlation and Regression
 Influential Observations
 Outliers
 Diabetes and blood sugar (p. 129130)
 Use Tuesday activity data
 Lurking Variables
 Explanatory variables not included
 Beyond scope
 Influential Observations

 The Question of Causation
 Correlation does not imply causation
 Spurious Correlations
 Retrospective study = looking back to find possible causes for an established outcome among a sample population
 Prospective studies = following a sample population over time and studying behaviors possibly linked to likelihood of an outcome
 Video
 Extrapolation
 Use of regression for prediction far outside the range of the explanatory variable
 Predictions outside the range are unreliable
 Further outside the range = less reliable predictions
 Sensitivity Analysis Demonstration
 Remove influential observations and recalculate regression equation and RSquared
 Use glucose level data
 The Question of Causation
Presidential Election Year  Time Period  Ballots Cast (thousands) 
2004  1  68.4 
2008  2  73.9 
2012  3  77.7 
2016  4  78.7 
 Produce a scatter plot with Time Period on the x axis and Ballots Cast on the y axis
 Find the linear regression equation and the corresponding RSquared value
 Remove the 2004 data
 Recalculate the linear regression equation and the corresponding RSquared value
 Remove the 2016 data
 Recalculate the linear regression equation and the corresponding RSquared value
Activity:
Price of Coffee ($ per pound)  Deforestation (%) 
0.29  0.49 
0.40  1.59 
0.54  1.69 
0.55  1.82 
0.72  3.10 
 Produce a scatter plot with Price of Coffee on the x axis and Deforestation on the y axis
 Find the linear regression equation and the corresponding RSquared value
 Estimate Deforestation % assuming Price of Coffee is $0.90 per pound.
 Sensitivity Analysis
 Identify and remove the most influential observation (use scatter plot)
 Recalculate the linear regression equation and the corresponding RSquared value
 Recalculate Deforestation % assuming Price of Coffee is $0.90 per pound.
 Go back to Lesson 15 and use the data from the Activity section
 Follow the same process, i.e., look for outliers, remove and recalculate.