Acknowledgement

These notes use content from OpenIntro Statistics Slides by

Mine Cetinkaya-Rundel.

6.1 Inference for a single proportion

This section is a revisit of what we learned in Chapter 5.

  • Parameter \(p\) and point estimate \(\hat{p}\)

  • Central Limit Theorem for sample proportions

  • Parameter \(p\) and point estimate \(\hat{p}\)

  • Standard Error

  • Margin of Error (ME)

  • Confidence Interval (CI)

  • Hypothesis Testing (HT)

  • Sample Size

Parameter and point estimate

We would like to estimate the proportion of all Americans who have good intuition about experimental design, i.e. would answer “500 get the drug, 500 don’t”? What are the parameter of interest and the point estimate?

  • Parameter of interest: Proportion of all Americans who have good intuition about experimental design.

    \[\mathbf{p} \ \text{(a population proportion)}\]

  • Point estimate: Proportion of sampled Americans who have good intuition about experimental design.

    \[\hat{\mathbf{p}} \ \text{(a sample proportion)}\]

CLT: Sample proportions are nearly normally distributed

Central limit theorem for proportions Sample proportions will be nearly normally distributed with mean equal to the population proportion, \(p\), and standard error equal to \(\sqrt{\frac{p(1-p)}{n}}\).

But of course this is true only under certain conditions:

  1. Independent observations;

  2. At least 10 successes and 10 failures

Note: If \(p\) is unknown (most cases), we use \(\hat{p}\) in the calculation of the standard error.

Inference on a proportion

  • We can estimate a population proportion using a sample and \(\color{blue}{confidence \hspace{0.2cm}interval}\), which we know is always of the form

\[{\textbf{point estimate} \mathbf{\pm} \textbf{ME}}\]

  • The margin of error is

    \(\color{blue}{ME = critical\hspace{0.2cm} value \times standard \hspace{0.2cm} error}\) of the point estimate

  • The point estimate for proportions is \(\color{blue}{\hat{p}}\)

  • The critical value depends on the confidence level

  • Standard error of a sample proportion \(\color{blue}{SE_{\hat{p}}= \sqrt{\frac{p(1-p)}{n}} \sim \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}}\)

Example

The GSS found that 571 out of 670 (85%) Americans answered the question on experimental design correctly. Estimate (using a 95% confidence interval) the proportion of all Americans who have good intuition about experimental design?

Given: \(n = 670, \hat{p} = 0.85\). First, check conditions.

  • Independence: The sample is random, and 670 < 10% of all Americans, therefore we can assume that one respondent’s response is independent of another.

  • Success-Failure: 571 people answered correctly (successes) and 99 answered incorrectly (failures), both are greater than 10.

Practice

We are given that \(n = 670, \hspace{0.2cm} \hat{p}= 0.85\), the standard error of the sample proportion is \(SE = \sqrt{\frac{p(1-p)}{n}} \sim \sqrt{\frac{\hat{p}(1-\hat{p)}}{n}}\). Which of the below is the correct calculation of the 95% confidence interval?

  1. \({0.85} \pm 1.96 \times \sqrt{\frac{0.85 \times 0.15}{670}} \rightarrow \color{red}{(0.82, 0.88)}\)

  2. \(0.85 \pm 1.65 \times \sqrt{\frac{0.85 \times 0.15}{670}}\)

  3. \(0.85 \pm 1.96 \times \frac{0.85 \times 0.15}{\sqrt{670}}\)

  4. \(571 \pm 1.96 \times \sqrt{\frac{571 \times 99}{670}}\)

Choosing a (minimum) sample size

How many people should you sample in order to cut the margin of error of a 95% confidence interval down to 1%?

\[ME = z_{\frac{\alpha}{2}} \times SE\] \[ \begin{align*} 0.01 &\geq 1.96 \times \sqrt{\frac{0.85 \times 0.15}{n}}\color{red}{\rightarrow \text{Use } \hat{p} \text{ from previous study}}\\ 0.01^2 &\geq 1.96^2 \times \frac{0.85 \times 0.15}{n}\\ n &\geq \frac{1.96^2 \times 0.85 \times 0.15}{0.01^2}\\ n &\geq 4898.04 \color{red}{\rightarrow \text{n should be at least 4,899}} \end{align*} \]

Choosing the sample size given margin of error

Use \[M.E = z_{\frac{\alpha}{2}} \sqrt{\frac{p(1-p)}{n}} \rightarrow \frac{M.E}{z_{\frac{\alpha}{2}}} = \sqrt{\frac{p(1-p)}{n}}\]

\[\big({ \frac{M.E}{z_{{\alpha}/2}}}\big)^2 = {\frac{p(1-p)}{n}} \rightarrow n = \frac{p(1-p)}{\big({ \frac{M.E}{z_{\frac{\alpha}{2}}}}\big)^2}\]

\[n= {\frac{p(1-p)z^2_{\frac{\alpha}{2}}}{(M.E)^2}}\] So to control the ME under the specified level, take

\[ n \ge {\frac{p(1-p)z^2_{{\alpha}/2}}{(M.E)^2}}\] (i.e., take the ceiling integer)

Choosing the sample size if there isn’t a previous study (Contd)

  • The general sample size formula is: \[ n \ge {\frac{p(1-p)z^2_{{\alpha}/2}}{(M.E)^2}}\]

  • When we have no value for \(p\) from a previous study nor \(\hat{p}\), we use \(p=0.5\).

  • This is because the function \(f(p) = p(1-p)\) for \(0≤𝑝≤1\) reaches its maximum at \[p =\frac{1}{2}, f(\frac{1}{2}) = \frac{1}{4}\]

  • Substituting \(p=\frac{1}{2}\) in the general formula above gives

\[n \ge {\frac{z^2_{{\alpha}/2}}{4(M.E)^2}}\]

CI vs. HT for proportions

  • Success-Failure conditions:

    • CI: At least 10 observed successes and failures.
    • HT: At least 10 expected successes and failures, calculated using the null value \(p_0\)
  • Standard error:

    • CI: Calculate using observed sample proportion: \(SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\).

    • HT: Calculate using the null value: \(SE = \sqrt{\frac{p_0(1-p_0)}{n}}\)

Example

The GSS found that 571 out of 670 (85%) Americans answered the question on experimental design correctly. Do these data provide convincing evidence that more than 80% of Americans have a good intuition about experimental design?

Solution

  • \(H_0: p = 0.80 \hspace{1.5cm} H_a: p>0.80\)

  • \(p_0=0.80, 𝑛=670, n𝑝_0=536>10,𝑛(1−𝑝_0 )=134>10\)
    \(SE = \sqrt{\frac{p_0(1-p_0)}{n}}= \sqrt{\frac{0.80(1-0.80}{670}} = 0.01545\)

If \(H_0\) were true, then \(\hat{p} \sim N(0.80,0.01545)\).

  • \(Z= \frac{0.85-0.80}{0.01545} = 3.2362\)

  • P-value = \(𝑃(\hat{p}>0.85)= 1-0.9994 = 0.0006\)

Example (Contd)

Since the p-value is low, we reject \(H_0\). The data provide convincing evidence that more than 80% of Americans have a good intuition on experimental design.

Recap - Inference for one proportion (end of 6.1)

  • Popular parameter: \(p\). point estimate: \(\hat{p}\)

  • Conditions:

    • Independence - Random sample and 10% condition
    • At least 10 successes and failures
  • Standard error: \(SE = \sqrt{\frac{p(1-p)}{n}}\)

    • for CI: use \(\hat{p}\)
    • for HT: use \(p_0\)