6.1 Introducing Statistics: Why Be Normal?
In Unit 5, we discovered the power of the Normal distribution when describing sampling distributions. For categorical data (proportions), we rely on the Normal model to make inferences about an entire population based on a single sample.
The Foundation of Inference: To estimate a parameter or test a claim, we must ensure the sampling distribution of our statistic (like p̂) is approximately Normal. We check this using the Large Counts Condition.
When this condition is met, we can use Z-scores and the standard Normal curve to calculate margins of error and p-values. If the condition fails, the distribution is skewed, and our Normal-based calculations will be incorrect.
Exam Tip: Never just state "Large Counts met." Always show the actual multiplication with your specific sample size and proportion to prove they are $\ge 10$.6.2 Constructing a Confidence Interval for a Population Proportion
A Confidence Interval (CI) provides a range of plausible values for the unknown population proportion. Because sample statistics naturally vary (sampling variability), a point estimate alone isn't enough.
Formula: 1-Sample Z-Interval for p
Point Estimate ± Margin of Error
(where Margin of Error = Critical Value × Standard Error)
The Four-Step Process (PANIC)
On Free Response questions, use the PANIC acronym to ensure full credit.
| Step | What to do |
|---|---|
| P | Parameter: Define the parameter p in context (e.g., "p = the true proportion of..."). |
| A | Assess Conditions: 1. Random sample/assignment 2. 10% Condition (n ≤ 10% of pop) 3. Large Counts (np̂ ≥ 10, n(1−p̂) ≥ 10) |
| N | Name Procedure: State "1-Sample Z-Interval for p". |
| I & C | Interval & Conclude: Calculate the interval and interpret it: "We are C% confident that the interval from [lower] to [upper] captures the true proportion of..." |
Example: HighFiveAP surveys a random sample of 200 high school students and finds that 140 of them prefer digital flashcards over paper. Create a 95% confidence interval for the true proportion.
1. Point Estimate (p̂):
140 / 200 = 0.70
2. Critical Value (z*):
For 95% confidence, z* = 1.96
3. Calculation:
0.70 ± 1.96 * √[(0.70)(0.30) / 200]
0.70 ± 1.96 * 0.0324
0.70 ± 0.0635 ➔ (0.6365, 0.7635)
Calculator Commands (TI-83/84)
STAT ➔ TESTS ➔ A: 1-PropZInt
x: success count (must be a whole number!), n: sample size, C-Level: confidence level (e.g., 0.95)
6.3 Justifying a Claim Based on a Confidence Interval
Once we have a confidence interval, we can use it to evaluate claims made about the population parameter.
Plausible Claims
If a hypothesized value IS INSIDE the confidence interval, it is a plausible value for the true parameter. We cannot reject the claim.
Implausible Claims
If a hypothesized value IS OUTSIDE the confidence interval, we have convincing evidence that the claim is false.
Scenario: A school district claims that exactly 80% of students use the HighFiveAP platform for exam review. Based on our interval from Section 6.2 (0.6365, 0.7635), does this support the district's claim?
Conclusion: No. Because 0.80 is not contained within our 95% confidence interval, we have convincing evidence against the district's claim. It is highly likely the true proportion is lower.
6.4 Setting Up a Test for a Population Proportion
A Hypothesis Test provides a formal process for weighing evidence against a specific claim. It starts by setting up two competing hypotheses.
Null Hypothesis (H₀): The claim we weigh evidence against. It represents "no difference" or the status quo. It always uses an equal sign (e.g., p = p₀).
Alternative Hypothesis (Hₐ): The claim we are trying to find evidence FOR. It uses inequalities (<, >, or ≠).
The Test Statistic
To test the claim, we calculate a test statistic (z-score) which tells us how many standard errors our sample proportion (p̂) is away from the null value (p₀).
Test Statistic Formula
⚠️ Critical Difference: Notice that the denominator uses p₀ (the null value), not p̂! In a hypothesis test, we build our distribution assuming the null hypothesis is true, so we use p₀ for both the standard error and the Large Counts condition check.
6.5 Interpreting p-Values
The p-value is the most critical concept in modern statistical inference. It measures the strength of your evidence against H₀.
The Golden Definition of a p-value
The probability of getting a sample statistic as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true.
Interpretation Example: "If the true proportion of cats who like the new magic wand is really 0.50 (H₀), there is only a 0.03 (3%) probability of getting a sample proportion of 0.63 or higher purely by random chance."
6.6 Concluding a Test for a Population Proportion
To make a final decision, we compare our p-value to our pre-selected significance level ($\alpha$), which is usually 0.05.
| p-value ≤ α | p-value > α |
|---|---|
| Decision: Reject H₀ | Decision: Fail to Reject H₀ |
| Because the p-value is less than α, the result is statistically significant. We HAVE convincing evidence for Hₐ. | Because the p-value is greater than α, the result is not statistically significant. We DO NOT HAVE convincing evidence for Hₐ. |
🎯 Memory Trick: "If the p is low, the null must go!" (Reject). "If the p is high, the null must fly!" (Fail to Reject).
6.6 Potential Errors When Performing Tests
Because inference relies on partial data, occasionally a random sample will lead us to the wrong conclusion. We classify these into two types of errors.
Type I Error (False Positive)
Rejecting H₀ when H₀ is actually true.
- You find convincing evidence for a change/effect that doesn't actually exist.
- Probability of a Type I error = α (Significance Level).
Type II Error (False Negative)
Failing to reject H₀ when Hₐ is actually true.
- You miss a real effect or difference because your sample didn't show enough evidence.
- Probability is denoted by β.
Power of a Test: The probability that a test correctly rejects a false null hypothesis.
Power = 1 − P(Type II Error)
You can increase power by increasing sample size ($n$) or increasing the significance level ($\alpha$).
6.8 Confidence Intervals for the Difference of Two Proportions
When we want to estimate the difference between two populations (or two treatment groups in an experiment), we calculate a 2-Sample Z-Interval.
Formula: 2-Sample Z-Interval for $p_1 - p_2$
Conditions Update: You must check the Random, 10%, and Large Counts conditions for BOTH samples independently before proceeding.
6.9 Justifying a Claim Based on a Confidence Interval for a Difference
When analyzing an interval for $p_1 - p_2$, the most critical number to look for is Zero (0).
Interval Contains 0
Example: (-0.05, 0.12)
Since 0 is plausible, it is highly possible there is no difference between the two proportions. We cannot justify a claim that one group is greater than the other.
Interval Does Not Contain 0
Example: (0.04, 0.18) OR (-0.22, -0.08)
Since 0 is not plausible, we have convincing evidence that there is a difference between the groups.
6.10 Setting Up a Test for the Difference of Two Population Proportions
When testing to see if two populations differ, our Null Hypothesis assumes they are exactly the same.
H₀: p₁ = p₂ (or p₁ − p₂ = 0)
Hₐ: p₁ > p₂ (or <, or ≠)
The Pooled Proportion ($\hat{p}_C$)
If we assume under $H_0$ that the two populations have the same proportion, we shouldn't use two different $p$ values to calculate our standard error. Instead, we combine (pool) the successes and sample sizes into one giant sample.
6.11 Carrying Out a Test for the Difference of Proportions
Finally, we calculate the z-statistic for a two-sample test using the pooled proportion.
2-Sample Z-Test Statistic Formula
Once you calculate the z-score, find the p-value using your normal distribution tools, compare it to $\alpha$, and state your conclusion in context, exactly as done in Section 6.6.
Calculator Commands (TI-83/84)
STAT ➔ TESTS ➔ 6: 2-PropZTest
x1, n1: sample 1 stats. x2, n2: sample 2 stats. The calculator handles the pooled proportion for you in the background!
Unit 6 Key Takeaways
Confidence Intervals: Point Estimate ± Margin of Error
PANIC for intervals, PHANTOMS for tests
p-value: Prob of result this extreme assuming H₀ is true
Type I Error: Reject true H₀ | Type II Error: Fail to reject false H₀
Use p₀ for SE in 1-Prop Tests, Use pooled $\hat{p}_C$ for SE in 2-Prop Tests
End of Unit 6 Study Guide. Ready to build those flashcards?