## Lesson 22: Bootstrap confidence intervals for linear regression

April 11, 2018

Review:

- Bootstrap confidence intervals
**Exam 2 Redo**on Monday- Remember, it’s optional if you’re content with your current score/grade
- Covers all Python and DataCamp material
- Format similar to original Exam 2 but perhaps less technical and more conceptual
- Multiple choice plus some short answer
**Code snippets you should know how to write**- For loops (for i in range(100):)
- If statements (+ if-else and if-elif-else)
- Random number generators
- np.random.randint(min, max)
- np.random.normal(mean, std, n)
- np.random.choice(1d_variable, n)

- Summary statistics (e.g., np.mean(), np.median(), np.std())
- Regression with np.polyfit(x, y, 1)
- Read csv files with pandas (e.g., pd_variable = pd.read_csv(‘filename.csv’))
- List operations
- initialize e.g., list_variable = []
- append to list e.g., list_variable.append(new_variable))
- slicing e.g., list_subset = list_variable[2:4]

**Conceptual material you should understand**- Monte Carlo simulation (repeated random sampling to generate probability distribution)
- Bootstrapping (simulation by repeated sampling of original data set)
- Confidence Intervals
- Graphics for Exploratory Data Analysis (EDA)
- histograms, stripplots, boxplots, swarmplots (univariate)
- scatterplots, trendlines (bivariate)

Presentation:

- Pairs bootstrap for linear regression
- repeated sampling of 2 (or more) values
- calculation of model output (y-hat)
- use np.percentile() to calculate confidence intervals
- plot multiple trendlines to visualize confidence intervals

Activity:

- Use real estate data to generate selling price estimates with confidence intervals
- Use Total SqFt and Selling Price
- Produce numeric estimates
- Produce graphical representation

Assignment:

- Study for Exam 2 Redo