## Lesson 19: Cautions about Correlation and Regression

November 13, 2019

Review:

- Pearson Correlation Coefficient
- R-Squared
**Exam 3 on Wed, Nov 20**- Review for Exam 3 on Mon, Nov 18

Presentation:

- Cautions about Correlation and Regression
- Influential Observations
- Outliers
- Diabetes and blood sugar (p. 129-130)

- Lurking Variables
- Explanatory variables not included
- High School math and success in college (p. 132-133)

- The Question of Causation
- Correlation does not imply causation
- Spurious Correlations
- Retrospective study = looking back to find possible causes for an established outcome among a sample population
- Prospective studies = following a sample population over time and studying behaviors possibly linked to likelihood of an outcome
- Video

- Extrapolation
- Use of regression for prediction far outside the range of the explanatory variable
- Predictions outside the range are unreliable
- Further outside the range = less reliable predictions

**Sensitivity Analysis**Demonstration- Remove influential observations and recalculate regression equation and R-Squared
- Use Pueblo Voter Turnout data

- Influential Observations

Presidential Election Year | Time Period | Ballots Cast (thousands) |

2004 | 1 | 68.4 |

2008 | 2 | 73.9 |

2012 | 3 | 77.7 |

2016 | 4 | 78.7 |

- Produce a scatter plot with Time Period on the x axis and Ballots Cast on the y axis
- Find the linear regression equation and the corresponding R-Squared value
- Remove the 2004 data
- Recalculate the linear regression equation and the corresponding R-Squared value
- Remove the 2016 data
- Recalculate the linear regression equation and the corresponding R-Squared value

Activity:

Price of Coffee ($ per pound) | Deforestation (%) |

0.29 | 0.49 |

0.40 | 1.59 |

0.54 | 1.69 |

0.55 | 1.82 |

0.72 | 3.10 |

- Produce a scatter plot with Price of Coffee on the x axis and Deforestation on the y axis
- Find the linear regression equation and the corresponding R-Squared value
- Estimate Deforestation % assuming Price of Coffee is $0.90 per pound.
- Sensitivity Analysis
- Identify and remove the most influential observation (use scatterplot)
- Recalculate the linear regression equation and the corresponding R-Squared value
- Recalculate Deforestation % assuming Price of Coffee is $0.90 per pound.