3.1 Point Estimates and Sampling Variability

3.1.1 Objectives

By the end of this unit, students will be able to:

  • Understand the meaning of sampling distributions.
  • Apply the central limit theorem to define the sampling distribution of a sample proportion.
  • Identify the conditions needed for the central limit theorem to apply for sample proportions.

3.1.2 Overview

Statistical inference is the process of using sample data to make generalizations about a population.
Because complete population data are rarely available, we rely on sample statistics to estimate population parameters.

A parameter is a numerical value that summarizes a characteristic of a population (e.g., the proportion of adults supporting renewable energy).
A statistic is a numerical summary based on a sample (e.g., the proportion supporting renewable energy in a survey of 1,000 adults).

Since we usually cannot observe the true parameter value, the statistic serves as our point estimate — our best single-number guess of the unknown parameter.


Basic Concepts and Notation

Let
\[ \hat{p} = \frac{x}{n} \] represent the sample proportion, where: - \(x\) = number of “successes” in the sample
- \(n\) = total sample size

Then:
- \(\hat{p}\) is an unbiased point estimator of the population proportion \(p\).
- Each observed \(\hat{p}\) is a point estimate of \(p\).
- The error in estimation is \(\text{Error} = \hat{p} - p\).

Because different random samples yield different values of \(\hat{p}\), this variability is summarized through its sampling distribution.


Sampling Error and Bias

Two key types of error affect estimates:

  1. Sampling Error – Random variability from sample to sample.
    • Describes how much a statistic (like \(\hat{p}\)) tends to differ across random samples.
    • Larger samples reduce sampling error because they better represent the population.
  2. Bias – Systematic deviation from the true population value.
    • Arises when data collection favors certain outcomes (e.g., leading survey wording or unrepresentative samples).
    • Reducing bias depends on sound study design and random sampling.

Sampling Distribution of the Sample Proportion

The sampling distribution of \(\hat{p}\) is the probability distribution of all possible values of \(\hat{p}\) obtained from repeated random samples of the same size \(n\) from a population.

If we repeatedly draw simple random samples of size \(n\) and calculate \(\hat{p}\) for each, the collection of all these sample proportions will form a distribution that: - is centered around the true proportion \(p\),
- becomes less variable as \(n\) increases, and
- tends toward a normal shape when \(n\) is sufficiently large.


Example: Estimating a Population Proportion

Suppose the Gallup Organization surveys 1,520 U.S. adults, and 1,135 respond that “legal immigration is a good thing.”
The sample proportion is:

\[ \hat{p} = \frac{1135}{1520} = 0.747 \]

Thus, we estimate that about 74.7% of U.S. adults believe immigration is good for the country.
This value (0.747) is a point estimate of the population proportion \(p\).


The Central Limit Theorem (CLT) for Proportions

The Central Limit Theorem describes the shape of the sampling distribution of \(\hat{p}\) under certain conditions.

If: - Observations are independent,
- The sample size \(n\) is large enough such that \(np \ge 10\) and \(n(1-p) \ge 10\), and
- The sample size is less than 10% of the population size (\(n < 0.10N\)),

then the sampling distribution of \(\hat{p}\) is approximately Normal with:

\[ \hat{p} \sim N\left(p, \sqrt{\frac{p(1-p)}{n}}\right) \]

That is, \[ Z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} \sim N(0,1) \]


Interpretation and Conditions

  • Mean (Expected Value): \(E[\hat{p}] = p\) — the estimator is unbiased.

  • Standard Error (S.E.):
    \[ S.E._{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \] If \(p\) is unknown, replace it with \(\hat{p}\) to approximate S.E.

  • Finite Population Correction:
    When \(n \ge 0.10N\), adjust the standard error:
    \[ S.E. = \sqrt{\frac{p(1-p)}{n}} \times \sqrt{\frac{N - n}{N - 1}} \]

  • Conditions for CLT (Success–Failure Condition):
    Check \(np \ge 10\) and \(n(1-p) \ge 10\).
    When \(p\) is unknown, use \(\hat{p}\) instead.


Example: Applying the CLT

Suppose \(p = 0.88\) (true proportion of adults supporting solar energy expansion) and \(n = 1000\).

Then: \[ \mu_{\hat{p}} = p = 0.88 \] \[ \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} = \sqrt{\frac{0.88(1-0.88)}{1000}} = 0.010 \]

If we want \(P(0.86 < \hat{p} < 0.90)\): \[ Z_1 = \frac{0.86 - 0.88}{0.010} = -2, \quad Z_2 = \frac{0.90 - 0.88}{0.010} = 2 \] \[ P(0.86 < \hat{p} < 0.90) = P(-2 < Z < 2) \approx 0.9545 \]


This means approximately 95% of sample proportions will fall between 0.86 and 0.90.

Effect of Sample Size on Sampling Variability

Larger sample sizes reduce variability in the sampling distribution of \(\hat{p}\).

Example: For \(p = 0.88\), compare \(n = 100\) vs. \(n = 1000\): | Sample Size | \(S.E. = \sqrt{\frac{p(1-p)}{n}}\) | Interpretation | |————–|————————————–|—————-| | 100 | 0.0325 | Greater variability (wider spread) | | 1000 | 0.0100 | Less variability (narrower, more precise) |


Summary

  • A point estimate is a sample statistic (like \(\hat{p}\)) used to estimate a population parameter (like \(p\)).
  • Sampling error arises from natural sample-to-sample variability; bias arises from flawed design.
  • The sampling distribution of \(\hat{p}\) describes how the estimate behaves across repeated samples.
  • Under the Central Limit Theorem, when sample conditions are met, \(\hat{p}\) follows an approximately Normal distribution.
  • Increasing \(n\) decreases the standard error, improving precision.

3.1.3 Knowledge Check

3.1.4 Solved Exercises

Exercise 1

In a random sample with size \(n = 6000\), the count of “yes” is \(x = 420\).

(a) Compute the sample proportion \(\hat{p} = \frac{x}{n}\).

(b) Compute the estimated standard error of the sample proportion

\[ S.E. \approx \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}. \]

Solution

\[ \hat{p} = \frac{420}{6000} = 0.07 \]

\[ S.E. = \sqrt{\frac{0.07(1-0.07)}{6000}} \approx 0.003294 \]

n <- 6000
x <- 420
phat <- x / n
se <- sqrt(phat * (1 - phat) / n)
phat
## [1] 0.07
se
## [1] 0.003293934

Exercise 2

In a random sample of \(n = 820\) adults, \(x = 295\) say they could not cover a $500 unexpected expense without borrowing money or going into debt.

(a) What population is under consideration?
(b) What parameter is being estimated?
(c) Compute a point estimate for the parameter.
(d) What is the estimated standard error?

Solution

(a) The population is all U.S. adults (or the population the sample represents).

(b) The parameter is the population proportion \(p\) of adults who could not cover the $500 expense.

(c) The sample proportion is:

\[ \hat{p} = \frac{x}{n} = \frac{295}{820} \approx 0.3598 \]

(d) The estimated standard error is:

\[ S.E. = \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} = \sqrt{\frac{0.3598(1 - 0.3598)}{820}} \approx 0.01676 \]

n <- 820
x <- 295
phat <- x / n
se <- sqrt(phat * (1 - phat) / n)
phat
## [1] 0.3597561
se
## [1] 0.01675984

Exercise 3

Of all freshmen at a large college, 22% made the dean’s list.

(a) What is the value of the interested parameter? State the sampling distribution of the sample proportion for sample size \(n = 100\).

(b) If a random sample of 100 freshmen selected 18 made the dean’s list, compute the sample proportion and the Z-score.

(c) If a random sample of 100 freshmen selected 30 made the dean’s list, compute the sample proportion and the Z-score.

(d) What is the probability that at most 18 of the selected 100 freshmen made the dean’s list?

(e) What is the probability that between 18 and 30 students of the selected 100 freshmen made the dean’s list?

Solution

# Exercise 3: Sampling Distribution and Probabilities

# (a) Given parameters
p <- 0.22
n <- 100
SE <- sqrt(p * (1 - p) / n)
SE
## [1] 0.04142463
# (b) Case 1: 18 out of 100 made the dean’s list
x1 <- 18
phat1 <- x1 / n
z1 <- (phat1 - p) / SE
phat1; z1
## [1] 0.18
## [1] -0.9656091
# (c) Case 2: 30 out of 100 made the dean’s list
x2 <- 30
phat2 <- x2 / n
z2 <- (phat2 - p) / SE
phat2; z2
## [1] 0.3
## [1] 1.931218
# (d) Probability that at most 18 made the dean’s list
p_lower <- pnorm(z1)
p_lower
## [1] 0.1671199
# (e) Probability that between 18 and 30 made the dean’s list
p_between <- pnorm(z2) - pnorm(z1)
p_between
## [1] 0.8061521