A commonly posed question is "Are two proportions different?" For example, is the end-of-course passing rate this year significantly different from the end-of-course passing rate last year. This type of question involves comparing two proportions both of which are estimates, and thus the necessary formula will be different.
In this lesson we will develop the theory that allows us to apply the hypothesis testing procedure to this problem. The procedure will be very similar to the one before except that we have a different test statistic. One which involves two proportions. But this statistic will still be a z-statistic and we will compare it to the normal scores, i.e. z-scores.
After completing this lesson you should be able to:
Given two independent variables that are both normal, their sum or difference will be normal distributed. This idea can readily be proven, but we will leave that for another class. The important point is that the difference of two sample proportions is an example of the difference of two normally distributed variables. Thus we can standardize them and obtain a a-score. For example, If you have two estimated proportions, 1 and 2, and you happen to know that one has a mean of 0.6 and the other a mean of 0.4, then you know their difference will have a mean of 1− 2 = 0.6 − 0.4 = 0.2. What we now need is the standard deviation of this difference.
Assuming that each estimate is based on 300 observations you would also be able to determine the variance of the difference. Since the variance of each estimated proportion is 0.6×0.4 ⁄ 300 = 0.0008. The variance of the difference would be the sum of the two variances or 0.0016. Which means the standard deviation of the difference is √0.0016 = 0.04.
All this may seem complicated but the bottom line is that we know the distribution of (1− 2). Thus we can calculate confidence intervals and do hypothesis tests on this variable.
Distribution of 1 − 2
Let 1 is a proportion based on a sample size n1 and 2 is based on a sample of size n2. Assume that both sample sizes are large enough (>100) then we can assume that the estimates of p are normally distributes.
If we assume that the underlying proportions from the two samples are the same, i.e. p1 = p2, then (1− 2) will have:
This fact will be useful to deriving a test-statistic for proportion. In this case 95% of the time the corresponding z-score should not deviate more than 1.96 units from 0. This z-score also called the test statistics for comparing two proportions is:
If we do not assume that the two proportions are the same then (1− 2) can estimate their difference. In this case this expression will have:
This means if we want to estimate the difference (or gap) between two proportions, p1 and p2 then the 95% confidence interval will be:
(1 −2) − 1.96·σ ≤ p1 − p2 ≤ (1 −2) + 1.96·σ
Example: Test the assertion that the EOG Mathematics test of female fifth grade students is no different than that of the male students. (Data for Asheville schools, 2008. NC DPI )
STEP I: Set up two opposing hypotheses.
Let p1 be the proportion of female students passing and p2 be the proportion of male student.
H0: p1 = p2 vs. HA: p1 ≠ p2
STEP II: Get data.
Even though all students in each grade are tested, you still only have a sample since your population is all students that could potentially be taught in the Asheville school system. According to NC Department of Public Instruction
89 of 126 (or 70.6%) female students passed the test and 119 of 142 (or 83.8%) male students passed the test.
# passed Sample size Estimated proportion Female Students 89 n1 = 126 1 = 0.706 Male Students 119 n2 = 142
2 = 0.838
All students 208 n = 268 = 0.776
STEP III: Decide on a statistical test.
We will use the two proportion z test given in this lesson.
STEP IV: Calculate your test statistic.
Every test has a formula that standardizes the estimator, in this case that estimator is . The statistic is:
Z-statistic = .
Z-stat = (0.706 − 0.838) ⁄ √0.776×0.224(1 ⁄ 126 + 1 ⁄ 142)
Z-stat ≈ −0.132 ⁄ √0.0026 ≈ −2.587.
STEP V: Arrive at a conclusion and state it in clear English.
Remember that the two critical values for the different levels of certainty are: z0.025 = 1.96 or z0.005 = 2.576 .
Since |z-stat| > z* (the critical value) we reject the Null Hypothesis.
Conclusion: With 99% certainty we can state that there is a statistically significant larger proportion of boys passing the fifth grade EOC in mathematics in the Asheville school district.
Additional step: Calculate the 95% confidence interval estimating the Gap.
When calculating complicated statistical formula and you only have a calculator, it is a good idea to break down a formula into parts and calculate the individual pieces. In this case we will calculate the mean and variance of each estimated proportion.
|Sample size||Estimated mean||Estimated standard deviation|
|Female Students||n1 = 126||1 = 0.706||√0.706×0.294 ⁄ 126 = 0.0406|
|Male Students||n2 = 142||2 = 0.838||
√0.838×0.162 ⁄ 142 = 0.0309
|Difference: p1 − p2||−0.132||√0.0406² + 0309² = 0.051|
Thus the 95% confidence interval for the difference of the two proportions is:
Thus the passing rate in Asheville of the fifth grade Math EOC is 3% to 23% higher in boys than in Girls.
For the each problems use the five step process of hypothesis testing to determine the answer. Specifically:
- State the null hypothesis and alternate hypothesis.
- Get the data and summarize it.
- Decide on a statistical test.
- Calculate the test statistic.
- Make a decision and state your conclusion.
- Find the 95% confidence interval.