Lesson 22: Bootstrap confidence intervals for linear regression
April 11, 2018
Review:
- Bootstrap confidence intervals
- Exam 2 Redo on Monday
- Remember, it’s optional if you’re content with your current score/grade
- Covers all Python and DataCamp material
- Format similar to original Exam 2 but perhaps less technical and more conceptual
- Multiple choice plus some short answer
- Code snippets you should know how to write
- For loops (for i in range(100):)
- If statements (+ if-else and if-elif-else)
- Random number generators
- np.random.randint(min, max)
- np.random.normal(mean, std, n)
- np.random.choice(1d_variable, n)
- Summary statistics (e.g., np.mean(), np.median(), np.std())
- Regression with np.polyfit(x, y, 1)
- Read csv files with pandas (e.g., pd_variable = pd.read_csv(‘filename.csv’))
- List operations
- initialize e.g., list_variable = []
- append to list e.g., list_variable.append(new_variable))
- slicing e.g., list_subset = list_variable[2:4]
- Conceptual material you should understand
- Monte Carlo simulation (repeated random sampling to generate probability distribution)
- Bootstrapping (simulation by repeated sampling of original data set)
- Confidence Intervals
- Graphics for Exploratory Data Analysis (EDA)
- histograms, stripplots, boxplots, swarmplots (univariate)
- scatterplots, trendlines (bivariate)
- Code snippets you should know how to write
Presentation:
- Pairs bootstrap for linear regression
- repeated sampling of 2 (or more) values
- calculation of model output (y-hat)
- use np.percentile() to calculate confidence intervals
- plot multiple trendlines to visualize confidence intervals
Activity:
- Use real estate data to generate selling price estimates with confidence intervals
- Use Total SqFt and Selling Price
- Produce numeric estimates
- Produce graphical representation
Assignment:
- Study for Exam 2 Redo