## Guest Lecture: Hypothesis Testing

April 16, 2017

Mon, Apr 17

Hypothesis Testing:

Purpose: to evaluate data for evidence of significant agreement or disagreement with a hypothesis

**Step 1.**Setup the null hypothesis (Ho) and alternate hypothesis (Ha)**Step 2.**Calculate the appropriate test statistic**Step 3.**Find the P-value (probability of obtaining result by chance)**Step 4.**Interpret results

**Example A**:**Do Math SAT scores improve significantly with coaching?**- National Math SAT scores are normally distributed with mean score = 505 and std. dev = 62
- Sampled 1,000 students who received coaching
- Sample mean score was 509
- Are these results significantly better than the national average?
**Step 1**: Setup the hypothesis test

**μ = 505, σ = 62**

**x̅ = 509, n = 1,000**

**Ho: μ = 505**

**Ha: μ > 505****Step 2**: Calculate the appropriate test statistic**Z-test =****(x̅ – μ)/(σ/√n)**- (509 – 505)/(62/√1000) = 4/1.96 =
**2.04**

**Step 3**: Find P-value- P = 1 – P(Z<2.04) = 1 – 0.9793 =
**0.0207**

- P = 1 – P(Z<2.04) = 1 – 0.9793 =
**Step 4**: Interpret results- 0.0207 ≅ 2.07% probability of getting this result by chance
- 0.0207 < 0.05
**Reject Ho**- Coaching seems to improve scores significantly

**Example B**:**Has a student paper been plagiarized?**- Previous student papers contain 7 unique vocabulary words on average with std dev of 2.6
- Submitted paper contains 10 unique words
- Is the submitted paper significantly different?
**Step 1**: Setup the hypothesis test**μ = 7, σ = 2.6**

**x̅ = 10**

**Ho: μ = 7**

**Ha: μ ≠ 7**

**Step 2**: Calculate the appropriate test statistic**Z-test =****(x̅ – μ)/(σ/√n)**- (10 – 7)/2.6 =
**1.15**

**Step 3**: Find P-value- P = 2*[1 – P(Z<1.15)] = 2*[1 – 0.8749] = 2*[0.1251] =
**0.2502**

- P = 2*[1 – P(Z<1.15)] = 2*[1 – 0.8749] = 2*[0.1251] =
**Step 4**: Interpret results- 0.2502 ≅ 25.02% probability of getting result by chance
- 0.2502 > 0.05
**Fail to Reject Ho**- Unique vocabulary is within normal range, no evidence of plagiarism

- Significance Testing video (7:20 min)
**Significance testing is like paternity testing.**When you check father-child DNA for a match you can prove one person is or is not the father but the same test doesn’t prove another person is the father. You’re evaluating one possibility at a time.

- Activity:
**Problem 1.1.**More than 200,000 people worldwide take the GMAT examination each year as they apply for MBA programs. Their scores vary Normally with mean about**μ = 525**and standard deviation about**σ = 100**. One hundred students,**n = 100**, go through a rigorous training program designed to raise their GMAT scores. The students who go through the program have an average score of**x̅ = 541.4**. Is there evidence to suggest the training program significantly improves GMAT scores?**Problem 1.2.**A newly installed rooftop solar system has been producing energy for**n = 100**days. Average energy production is**41.8**kWh per day with a standard deviation of**13.9**kWh. The solar panel manufacturer claims the panels typically produce**40**kWh per day. Is the newly installed system producing significantly more energy than estimated by the manufacturer?

**Different Procedures**:

- Unknown variance:
- Use t-tests instead of Z-tests. It’s the same procedure with a few subtle differences.
- When calculating the t-test statistic, substitute sample std dev (s) for population std dev (σ).
- P-values for t-test and Z-test are not exactly the same, but pretty close
- Check for slight differences in P-value and rejection region

- One-tail vs Two-tail tests:
- If
**Ha: ≠ then 2-tail**- Watch for keyword “different”

- If
**Ha: < or > then 1-tail**- Same for ≤ and ≥
- Watch for keywords “more than”, “less than” or similar

- If
- Proportions:
- Test statistics for proportions are different.
- Instead of Z = (x̅ – μ)/(σ/√n) we use Z = (p̂- po)/ [√(p̂)*(1 – p̂)/n] for proportions.
- Aside from the test statistic, we use the same procedure.

* Most example and activity problems presented are derived from* Moore, D.S., McCabe, G.P., and Craig, B.A., 2009. Introduction to the Practice of Statistics, 6th Edition. New York: W.H. Freeman and Company. *