Guest Lecture: Hypothesis Testing
April 16, 2017
Mon, Apr 17
Hypothesis Testing:
Purpose: to evaluate data for evidence of significant agreement or disagreement with a hypothesis
- Step 1. Setup the null hypothesis (Ho) and alternate hypothesis (Ha)
- Step 2. Calculate the appropriate test statistic
- Step 3. Find the P-value (probability of obtaining result by chance)
- Step 4. Interpret results
- Example A: Do Math SAT scores improve significantly with coaching?
- National Math SAT scores are normally distributed with mean score = 505 and std. dev = 62
- Sampled 1,000 students who received coaching
- Sample mean score was 509
- Are these results significantly better than the national average?
- Step 1: Setup the hypothesis test
μ = 505, σ = 62
x̅ = 509, n = 1,000
Ho: μ = 505
Ha: μ > 505- Step 2: Calculate the appropriate test statistic
- Z-test = (x̅ – μ)/(σ/√n)
- (509 – 505)/(62/√1000) = 4/1.96 = 2.04
- Step 3: Find P-value
- P = 1 – P(Z<2.04) = 1 – 0.9793 = 0.0207
- Step 4: Interpret results
- 0.0207 ≅ 2.07% probability of getting this result by chance
- 0.0207 < 0.05
- Reject Ho
- Coaching seems to improve scores significantly
- Example B: Has a student paper been plagiarized?
- Previous student papers contain 7 unique vocabulary words on average with std dev of 2.6
- Submitted paper contains 10 unique words
- Is the submitted paper significantly different?
- Step 1: Setup the hypothesis test
- μ = 7, σ = 2.6
x̅ = 10
Ho: μ = 7
Ha: μ ≠ 7
- μ = 7, σ = 2.6
- Step 2: Calculate the appropriate test statistic
- Z-test = (x̅ – μ)/(σ/√n)
- (10 – 7)/2.6 = 1.15
- Step 3: Find P-value
- P = 2*[1 – P(Z<1.15)] = 2*[1 – 0.8749] = 2*[0.1251] = 0.2502
- Step 4: Interpret results
- 0.2502 ≅ 25.02% probability of getting result by chance
- 0.2502 > 0.05
- Fail to Reject Ho
- Unique vocabulary is within normal range, no evidence of plagiarism
- Significance Testing video (7:20 min)
- Significance testing is like paternity testing. When you check father-child DNA for a match you can prove one person is or is not the father but the same test doesn’t prove another person is the father. You’re evaluating one possibility at a time.
- Activity:
- Problem 1.1. More than 200,000 people worldwide take the GMAT examination each year as they apply for MBA programs. Their scores vary Normally with mean about μ = 525 and standard deviation about σ = 100. One hundred students, n = 100, go through a rigorous training program designed to raise their GMAT scores. The students who go through the program have an average score of x̅ = 541.4. Is there evidence to suggest the training program significantly improves GMAT scores?
- Problem 1.2. A newly installed rooftop solar system has been producing energy for n = 100 days. Average energy production is 41.8 kWh per day with a standard deviation of 13.9 kWh. The solar panel manufacturer claims the panels typically produce 40 kWh per day. Is the newly installed system producing significantly more energy than estimated by the manufacturer?
Different Procedures:
- Unknown variance:
- Use t-tests instead of Z-tests. It’s the same procedure with a few subtle differences.
- When calculating the t-test statistic, substitute sample std dev (s) for population std dev (σ).
- P-values for t-test and Z-test are not exactly the same, but pretty close
- Check for slight differences in P-value and rejection region
- One-tail vs Two-tail tests:
- If Ha: ≠ then 2-tail
- Watch for keyword “different”
- If Ha: < or > then 1-tail
- Same for ≤ and ≥
- Watch for keywords “more than”, “less than” or similar
- If Ha: ≠ then 2-tail
- Proportions:
- Test statistics for proportions are different.
- Instead of Z = (x̅ – μ)/(σ/√n) we use Z = (p̂- po)/ [√(p̂)*(1 – p̂)/n] for proportions.
- Aside from the test statistic, we use the same procedure.
* Most example and activity problems presented are derived from Moore, D.S., McCabe, G.P., and Craig, B.A., 2009. Introduction to the Practice of Statistics, 6th Edition. New York: W.H. Freeman and Company.