Lesson 6: Percentiles and Boxplots
September 6, 2016
Wed, Sep 7
Review:
- Measuring Center with Mean and Median
- Problems with Median – SORT!
- Tutoring available Thursdays 4-6pm in GCB 201 with David Mould
Presentation:
- Percentiles
- What are Percentiles?
- How to calculate Percentiles
- To find the xth percentile
- calculate x/100 * count (rounded to the nearest integer)
- result is the position of the value in an ordered (smallest to largest) data set
- 25th percentile = 25/100 * count
- 50th percentile = 50/100 * count = Median
- 75th percentile = 75/100 * count
- Example
- Boxplots
- Quartiles and Inner Quartile Range (IQR)
- Example: Kick/Punt returns by Charles Nelson during Oregon vs UC Davis game Sat, Sep 3
- {-2, 16, 27, 31, 33, 38, 46, 62}
- n = 8
- mean = 31.4
- median = 31 (position 4)
- 25th percentile = 16 (position 2)
- 75th percentile = 38 (position 6)
- Min = -2
- Max = 62
Activity:
- Product Boxplots (by hand)
- Find the five number summary for Female height and Male height from the Student Survey data
- Draw 2 boxplots, side by side, for comparison purposes.
- Use the same scale for both box plots
- Repeat using the World Population estimate data
- Draw 4 boxplots, one per time period, side by side
- Use the same scale for all 4 boxplots
Assignment:
- Read pp. 34-38 (Quartiles, Five number summary, Boxplots)
- Watch Boxplots with Google Sheets
- Use Google Sheets
- Make a Copy of RE East Pueblo
- For each of the 3 neighborhoods: Belmont, Eastside and University
- Use the “Selling Price” variable (see Column I)
- Generate the 5 number summary
- Calculate the mean, median, 25th and 75th percentiles
- Identify the min and max values
- Produce 3 boxplots
- One boxplot per neighborhood, i.e., Belmont, Eastside and University
- Plot together on the same chart for side-by-side visual comparison
- This means using the same scale for all 3 boxplots
- You’re welcome to use Sheets or you can do it by hand (if done neatly)
2 Comments
I used sheets on my android phone. To be able to use the candlestick trend chart, I only get to use 4 arguments, not 5 as per the book. It asks for the min, open, close and max. So I used the min value, Q1 for open, Q3 for closed and max value. I then created the mean and median values separately so they will be charted next to their respective data group (belmont, mean, median….eside,mean,median) would this be okay for tests, or do I HAVE to have the median in the candlestick?
Sounds like you found a good trick for making boxplots. If the visualization works well in terms of showing the spread of the data and how the different neighborhoods compare then I’m good with it.