Lesson 21: Cautions about Correlation and Regression
November 11, 2025
Review:
- Pearson Correlation Coefficient
- R-Squared
- Schedule
- Finish Simple Regression this week
- Review and Exam 3 next week
- Thanksgiving
Presentation:
- Cautions about Correlation and Regression
- Influential Observations
- Outliers
- Lurking Variables
- Explanatory variables not included
- Beyond scope
- Extrapolation
- Use of regression for prediction far outside the range of the explanatory variable
- Predictions outside the range are unreliable
- Further outside the range = less reliable predictions
- The Question of Causation
- Correlation does not imply causation
- Spurious Correlations
- Retrospective study = looking back to find possible causes for an established outcome among a sample population
- Prospective studies = following a sample population over time and studying behaviors possibly linked to likelihood of an outcome
- Video
- Influential Observations
Demonstration in Sheets:
Now, here’s some real data, gathered from USDA National Water and Climate Center and the USGS Water Data Center. The snowpack data (snow-water equivalent inches) is for the Freemont Pass location. The streamflow data (ft^3/SEC) is for the Arkansas River station in Pueblo.
| Water Year | Snowpack (x) | Streamflow (y) |
| 2018 | 20.4 | 577.5 |
| 2019 | 28.1 | 1599.3 |
| 2020 | 20.2 | 901.0 |
| 2021 | 15.5 | 659.4 |
| 2022 | 17.6 | 730.4 |
| 2023 | 17.1 | 1016.0 |
| 2024 | 22.2 | 1455.7 |
- We did this manually last time
- Calculate Sum of Squares, the Linear Equation, the Pearson Correlation Coefficient and R^2.
- Using the regression equation, estimate Streamflow for 2025 assuming Snowpack x=18.2.
- This time…
- Use Sheets to produce the regression equation and point estimate for 2025.
- Check your answer from last week.
Activity/Assignment:
- Use Sheets (or Excel) to solve the problems from last week (see below)
- Use Sheets to find the regression equation for this larger real estate data set
Problem 1. A small coffee shop tracks daily high temperature (°F) and the number of iced coffees sold. Calculate Sum of Squares and the equation of the regression line.
| Day | Temp (x) | Iced Coffees Sold (y) |
|---|---|---|
| 1 | 60 | 42 |
| 2 | 65 | 48 |
| 3 | 70 | 52 |
| 4 | 75 | 61 |
| 5 | 80 | 65 |
Problem 2. A local brewery tracks weekly social media ad spending and kegs sold. Calculate Sum of Squares and the equation of the regression line.
| Week | Ad Spend (x, $100’s) | Kegs Sold (y) |
|---|---|---|
| 1 | 2 | 11 |
| 2 | 3 | 15 |
| 3 | 4 | 17 |
| 4 | 5 | 20 |
| 5 | 7 | 26 |
| 6 | 9 | 30 |
Problem 3. A realtor wants to estimate the relationship between square footage (x) and listing price (y in $1000s). Calculate Sum of Squares and the equation of the regression line.
| House | Sq Ft (x) | Listing Price (y) |
|---|---|---|
| A | 1,600 | 315 |
| B | 1,850 | 344 |
| C | 2,000 | 365 |
| D | 2,300 | 399 |
| E | 2,500 | 425 |