9.1 Introducing Statistics: Do Those Points Align?
In Unit 2, we learned how to describe the relationship between two quantitative variables using a Least-Squares Regression Line (LSRL): ŷ = a + bx. But that line was only for our specific sample.
Now, we want to know if that sample linear relationship provides convincing evidence of a true linear relationship in the entire population. Just like sample means (x̄) vary around the population mean (μ), our sample slopes (b) vary around the true population slope (β).
Parameters vs. Statistics for Regression:
- Sample LSRL: ŷ = a + bx (where b is the sample slope)
- True Population LSRL: μy = α + βx (where β is the true population slope)
The LINER Conditions
Before we can construct an interval or run a test, we must check five conditions, easily remembered by the acronym LINER:
- Linear: The scatterplot looks roughly linear, and the residual plot has no curved pattern.
- Independent: The 10% condition holds (if sampling without replacement).
- Normal: A dotplot/histogram of the residuals shows no strong skew or outliers.
- Equal Variance: The residual plot shows an even horizontal spread of points (no "fan" or "cone" shape).
- Random: Data comes from a random sample or randomized experiment.
9.2 Confidence Intervals for the Slope of a Regression Model
To estimate the true population slope β, we construct a t-interval for the slope.
Formula: t-Interval for Slope β
t* = critical value (df = n − 2)
SEb = standard error of the slope
🎯 Why n − 2 Degrees of Freedom?
In Unit 7, we used n − 1 because we had to estimate one parameter (the mean). For linear regression, we have to estimate TWO parameters (the y-intercept AND the slope) before we can look at the variation around the line. Therefore, we lose two degrees of freedom: df = n − 2.
Example: HighFiveAP wants to estimate the relationship between the number of practice quizzes a student takes (x) and their final AP Statistics exam score out of 100 (y). From a random sample of 15 students, the sample slope is 3.2 with a standard error of 0.85.
To find the 95% CI:
df = 15 − 2 = 13.
Using invT(0.025, 13), t* = 2.160.
Interval = 3.2 ± (2.160)(0.85) = 3.2 ± 1.836 = (1.364, 5.036)
9.3 Justifying a Claim About the Slope Based on a Confidence Interval
When analyzing an interval for a slope, the most important question is: Does the interval contain zero?
Interval Does Not Contain 0
Example: (1.36, 5.03)
Since all values in the interval are positive, we have convincing evidence of a positive linear relationship between the two variables. (If all values were negative, it would be a negative relationship).
Interval Contains 0
Example: (-1.2, 3.4)
Because a slope of 0 is a plausible value, there might be a completely flat horizontal line mapping x to y. We do not have convincing evidence of a linear relationship.
9.4 Setting Up a Test for the Slope of a Regression Model
If we want to formally test if a relationship exists, we set up a t-Test for the Slope.
Hypotheses for Slope:
H₀: β = 0 (No linear relationship)
Hₐ: β ≠ 0 (There is a linear relationship)
*You can also use > 0 or < 0 if you are testing for a specifically positive or negative relationship.
Test Statistic Formula
Usually, H₀ assumes β₀ = 0, so the formula simplifies to t = b / SEb
9.5 Carrying Out a Test: Reading Computer Output
In AP Statistics Unit 9, you rarely calculate the slope and standard error by hand. Instead, you are given a Computer Output Table. Knowing how to read this table is arguably the most important skill in this unit.
| Predictor | Coef | SE Coef | T | P |
|---|---|---|---|---|
| Constant | 12.45 | 3.10 | 4.01 | 0.001 |
| String Length | 0.85 | 0.14 | 6.07 | 0.000 |
Decoding the Table
Imagine you are testing the relationship between the length of a magic wand cat toy's string (x) and the minutes a cat engages with it (y).
- Predictor: The x-variable. Ignore the "Constant" row entirely for inference! Look ONLY at the variable row.
- Coef (b): The sample slope. Here, b = 0.85.
- SE Coef (SEb): The standard error of the slope. Here, SEb = 0.14.
- T: The test statistic. Calculated as 0.85 / 0.14 ≈ 6.07.
- P: The p-value for a two-sided test ($H_a: \beta \neq 0$). Since p ≈ 0.000 < 0.05, we reject H₀!
9.6 Skills Focus: Selecting an Appropriate Inference Procedure
Congratulations! You have learned all the inference procedures in AP Statistics. Your final challenge is looking at a dataset and knowing instantly which test to apply.
Categorical Data (Counts/Percentages)
- 1 Sample, 2 options: 1-Prop Z-Test
- 2 Samples, comparing options: 2-Prop Z-Test
- 1 Sample, 3+ options: Chi-Square GOF
- Two-Way Table: Chi-Square Homogeneity/Independence
Quantitative Data (Averages/Measurements)
- 1 Sample: 1-Sample t-Test for Mean
- 2 Independent Samples: 2-Sample t-Test for Means
- Paired Data (e.g., Before/After): Matched Pairs t-Test
- Comparing Two Quantitative Variables against each other: t-Test for Slope (Linear Regression)
Unit 9 Key Takeaways
Degrees of Freedom: df = n − 2
LINER Conditions: Linear, Independent, Normal Residuals, Equal Variance, Random
Null Hypothesis for Slope: H0: β = 0 (No linear relationship)
Computer Output: Always ignore the "Constant" row when doing inference for the slope.
Test Statistic: t = b / SEb
End of Unit 9 Study Guide. You've officially conquered the AP Stats curriculum!