Lesson 21: Bootstrap Confidence Intervals
April 9, 2018
Review:
- Seaborn scatter plots with “lmplot”
- Schedule for this week and next
- Mon Apr 9 (today) – Bootstrap Confidence Intervals
- Wed Apr 11 – Pairs bootstrap for linear regression and review for Exam 2 Redo
- Mon Apr 16 – Exam 2 Redo
- Wed Apr 18 – Term Project Assignment
Presentation:
- What is bootstrapping?
- Random sampling from the original data set with replacement
- Generating bootstrap samples and replicates
- np.random.choice()
- Calculating bootstrap confidence intervals
- repeated sampling and statistical calculation with “for loops”
- use np.percentile() to calcuate confidence intervals
- Example: https://repl.it/@justinholman/SellingPriceBootstrapConfidenceIntervals
Activity:
- Write a program in repl.it to generate bootstrap confidence intervals of “Selling Price per SqFt” (PPSF)
- Use your real estate data
- Use np.random.choice() to repeatedly sample 100 properties, repeat 1,000 times
- Calculate the mean PPSF (i.e., divide “Selling Price” by “Total SqFt”); again, repeat 1,000 times
- Save each mean PPSF in a list (this is just like the dice game…remember?)
- Produce a histogram of the resulting mean PPSF
- Calculate a 95% confidence interval of mean PPSF (i.e., find the 2.5 percentile and the 97.5 percentile)
Assignment:
- Complete the Statistical Thinking in Python (Part 2) course.