unit-9 – HighFiveAP

9.1 Introducing Statistics: Do Those Points Align?

In Unit 2, we learned how to describe the relationship between two quantitative variables using a Least-Squares Regression Line (LSRL): ŷ = a + bx. But that line was only for our specific sample.

Now, we want to know if that sample linear relationship provides convincing evidence of a true linear relationship in the entire population. Just like sample means (x̄) vary around the population mean (μ), our sample slopes (b) vary around the true population slope (β).

Parameters vs. Statistics for Regression:

Sample LSRL: ŷ = a + bx (where b is the sample slope)
True Population LSRL: μ_y = α + βx (where β is the true population slope)

The LINER Conditions

Before we can construct an interval or run a test, we must check five conditions, easily remembered by the acronym LINER:

Linear: The scatterplot looks roughly linear, and the residual plot has no curved pattern.
Independent: The 10% condition holds (if sampling without replacement).
Normal: A dotplot/histogram of the residuals shows no strong skew or outliers.
Equal Variance: The residual plot shows an even horizontal spread of points (no "fan" or "cone" shape).
Random: Data comes from a random sample or randomized experiment.

9.2 Confidence Intervals for the Slope of a Regression Model

To estimate the true population slope β, we construct a t-interval for the slope.

Formula: t-Interval for Slope β

b ± t* · SE_b

b = sample slope
t* = critical value (df = n − 2)
SE_b = standard error of the slope

🎯 Why n − 2 Degrees of Freedom?

In Unit 7, we used n − 1 because we had to estimate one parameter (the mean). For linear regression, we have to estimate TWO parameters (the y-intercept AND the slope) before we can look at the variation around the line. Therefore, we lose two degrees of freedom: df = n − 2.

Example: HighFiveAP wants to estimate the relationship between the number of practice quizzes a student takes (x) and their final AP Statistics exam score out of 100 (y). From a random sample of 15 students, the sample slope is 3.2 with a standard error of 0.85.

To find the 95% CI:
df = 15 − 2 = 13.
Using invT(0.025, 13), t* = 2.160.
Interval = 3.2 ± (2.160)(0.85) = 3.2 ± 1.836 = (1.364, 5.036)

9.3 Justifying a Claim About the Slope Based on a Confidence Interval

When analyzing an interval for a slope, the most important question is: Does the interval contain zero?

Interval Does Not Contain 0

Example: (1.36, 5.03)

Since all values in the interval are positive, we have convincing evidence of a positive linear relationship between the two variables. (If all values were negative, it would be a negative relationship).

Interval Contains 0

Example: (-1.2, 3.4)

Because a slope of 0 is a plausible value, there might be a completely flat horizontal line mapping x to y. We do not have convincing evidence of a linear relationship.

9.4 Setting Up a Test for the Slope of a Regression Model

If we want to formally test if a relationship exists, we set up a t-Test for the Slope.

Hypotheses for Slope:

H₀: β = 0 (No linear relationship)

Hₐ: β ≠ 0 (There is a linear relationship)

*You can also use > 0 or < 0 if you are testing for a specifically positive or negative relationship.

Test Statistic Formula

t =

b − β₀

SE_b

Usually, H₀ assumes β₀ = 0, so the formula simplifies to t = b / SE_b

9.5 Carrying Out a Test: Reading Computer Output

In AP Statistics Unit 9, you rarely calculate the slope and standard error by hand. Instead, you are given a Computer Output Table. Knowing how to read this table is arguably the most important skill in this unit.

Predictor	Coef	SE Coef	T	P
Constant	12.45	3.10	4.01	0.001
String Length	0.85	0.14	6.07	0.000

Decoding the Table

Imagine you are testing the relationship between the length of a magic wand cat toy's string (x) and the minutes a cat engages with it (y).

Predictor: The x-variable. Ignore the "Constant" row entirely for inference! Look ONLY at the variable row.
Coef (b): The sample slope. Here, b = 0.85.
SE Coef (SE_b): The standard error of the slope. Here, SE_b = 0.14.
T: The test statistic. Calculated as 0.85 / 0.14 ≈ 6.07.
P: The p-value for a two-sided test ($H_a: \beta \neq 0$). Since p ≈ 0.000 < 0.05, we reject H₀!

Exam Tip: The "P" in the computer output is almost always for a TWO-SIDED test ($\neq$). If your alternative hypothesis is ONE-SIDED ($>$ or $<$), you must take the table's p-value and divide it by 2!

9.6 Skills Focus: Selecting an Appropriate Inference Procedure

Congratulations! You have learned all the inference procedures in AP Statistics. Your final challenge is looking at a dataset and knowing instantly which test to apply.

Categorical Data (Counts/Percentages)

1 Sample, 2 options: 1-Prop Z-Test
2 Samples, comparing options: 2-Prop Z-Test
1 Sample, 3+ options: Chi-Square GOF
Two-Way Table: Chi-Square Homogeneity/Independence

Quantitative Data (Averages/Measurements)

1 Sample: 1-Sample t-Test for Mean
2 Independent Samples: 2-Sample t-Test for Means
Paired Data (e.g., Before/After): Matched Pairs t-Test
Comparing Two Quantitative Variables against each other: t-Test for Slope (Linear Regression)

Unit 9 Key Takeaways

Degrees of Freedom: df = n − 2

LINER Conditions: Linear, Independent, Normal Residuals, Equal Variance, Random

Null Hypothesis for Slope: H₀: β = 0 (No linear relationship)

Computer Output: Always ignore the "Constant" row when doing inference for the slope.

Test Statistic: t = b / SE_b

End of Unit 9 Study Guide. You've officially conquered the AP Stats curriculum!

Inference for Quantitative Data: Slopes