4.1 Quick Review on Inference for Mean

4.1.1 Objectives

By the end of this unit, students will be able to:

Distinguish between normal distribution and t distribution.
Compute point estimates and confidence intervals for estimating one population mean based on one sample and paired samples.

4.1.2 Overview

One-sample means with the t-distribution

Similar to how we can model the behavior of the sample proportion \(\hat p\) using a normal distribution, the sample mean \(\bar x\) can also be modeled using a normal distribution when certain conditions are met. However, we will soon learn that a new distribution, called the \(t\)-distribution, then we will use it to construct confidence intervals and conduct hypothesis tests for the mean.

The sampling distribution of \(\bar x\)

The sample mean tends to follow a normal distribution centered at the population mean, \(\mu\), when certain conditions are met. Additionally, we can compute a standard error for the sample mean using the population standard deviation \(\sigma\) and the sample size \(n\).

Central limit theorem for the sample mean

When we collect a sufficiently large sample of \(n\) independent observations from a population with mean \(\mu\) and standard deviation \(\sigma\), the sampling distribution of \(\bar x\) will be nearly normal with

\[\text{Mean} = \mu ~~~~~~~~~~~~~~~~~~~~~ \text{Standard Error}(SE) = \frac{\sigma}{\sqrt{n}} \]

Before diving into confidence intervals and hypothesis tests using \(\bar x\), we first need to conver two topics:

When we modeled \(\hat p\) using the normal distribution, certain conditions had to be satisfied. The conditions for working with \(\bar x\) are a little more complex, and we will spend the next section discussing how to check conditions for inference.
The standard error is dependent on the population standard deviation, \(\sigma\). However, we rarely know \(\sigma\), and instead we must estimate it. Because this estimation is itself imperfect, we use a new distribution called the \(t\)-distribution to fix this problem.

Evaluating the two conditions required for modeling \(\bar x\)

Two conditions are required to apply the Central limit theorem for a sample mean \(\bar x\):

Independence. The sample observations must be independent. The most common way to satisfy this condition is when the sample is a simple random sample from the population. If the data comes from a random process, analogus to rolling a die, this would also satisfy the independence condition.

Normality. When a sample is small, we also require that the sample observations come from a normally distributed population. We can relax this condition more and more for larger and larger sample sizes. This condition is obviously vague, making it difficult to evaluate, so next we introduce a couple rules of thumb to make checking this condition easier.

Rules of thumb: How to perform the normality check

There is no perfect way to check the normality condition, so instead we use two rules of thumb:

n < 30 : If the sample size \(n\) is less than 30 and there are no clear outliers in the data, then we typically assume the data come from a nearly normal distribution to satisfy the condition.
\(\mathbf{n \ge 30:}\) If the sample size \(n\) is at least 30 and there are no particularly extreme outliers, then we typically assume the sampling distribution of \(\bar x\) is nearly normal, even if the underlying distribution of individual observations is not.

In this section, you aren’t expected to develop perfect judgement on the normality condition. However, you are expected to be able to handle clear cut cases based on the rules of thumb.

In practice, it’s typical to also do a mental check to evaluate whether we have reason to believe the underlying population would have moderate skew (if \(n < 30\)) or have aprticularly extreme outliers (\(n \ge 30\)) beyond what we observe in the data. For example, consider the number of followers for each individual account on Twitter, and then imagine the distribution. The large majority of accounts have built up a couple thousand followers or fewer, while relatively tiny fraction have amassed tens of millions of followers, meaning the distribution is extremely skewed. When we know the data come from such an extremely skewed distribution, it takes some effor to understand what sample size is large enough for the normality condition to be satisfied.

Introducing the \(t\)-distribution

In practice, we cannot directly calculate the standard error for \(\bar x\) since we do not know the population standard deviation, \(\sigma\). We encountered a similar issue when computing the standard error for a sample proportion, which relied on the population proportion, \(p\). Our solution in the proportion context was to use sample value in place of the population value when computing the standard error. We will employ a similar strategy for computing the standard error of \(\bar x\), using the sample standard deviation \(s\) in place of \(\sigma\):

\[ SE = \frac{\sigma}{\sqrt{n}} \approx \frac{s}{\sqrt{n}} \]

This strategy tends to work well when we have a lot of data and can estimate \(\sigma\) using \(s\) accurately. However, the estimate is less precise with smaller samples, and this leads to problems when using the normal distribution to model \(\bar x\).

We will find it useful to use a new distribution for inference called the \(\mathbf{t}\)-distribution. A \(t\)- distribution, shown as a solid line in the figure below, has a bell shape. However, its tails are thicker than the normal distribution’s, meaning observations are more likely to fall beyond two standard deviations from the mean than under the normal distribution. The extra thick tails of the \(t\)-distribution are exactly the correction needed to resolve the problem of using \(s\) in place of \(\sigma\) in the \(SE\) calculation.

The \(t\)-distribution is always centered at zero and has a single parameter: degrees of freedom. The degrees of freedom \(\mathbf{(df)}\) describes the precise form of the bell-shaped \(t\)-distribution. Several \(t\)-distributions are shown in the figure below in comparison to the normal distribution.

In general, we will use a \(t\)-distribution with \(df = n-1\) to model the sample mean when the sample size is \(n\). That is, when we have more observations, the degrees of freedom will be larger and the \(t\)-distribution will look more like the standard normal distribution: when the degrees of freedom is about 30 or more, the \(t\)-distribution is nearly indistinguishable from the normal distribution.

Degrees of Freedom \(\mathbf(df)\)

The degrees of freedom describes the shape of the \(t\)-distribution. The larger the degrees of freedom, the more closely the distribution approximates the normal model.

When modeling \(\bar x\) using the \(t\)-distribution, use \(df = n - 1\).

The \(t\)-distribution allows us greater flexibility than the normal distribution when analyzing numerical data. In practice, its common to use a statistical software, such as R, Python or SAS for these analyses. Alternatively, a graphing calculator or a t-table may be used; the \(t\)-table is similar to the normal distribution table.

The t-distribution with 18 degrees of freedom. The area below -2.10 has been shaded

Left: The t-distribution with 20 degrees of freedom, with the area above 1.65 shaded. Right: The t-distribution with 2 degrees of freedom, with the area further than 3 units from 0 shaded.

In the normal model, we used \(z^*\) and the standard error to determine the width of a confidence interval. We revise the confidence interval formula slightly when using the \(t\)-distribution:

\[ point~estimate \pm t^*_{df} \times SE ~~~~~\rightarrow~~~~~ \bar x \pm t^*_{df} \times \frac{s}{\sqrt{n}} \]

Confidence interval for a single mean

Once you’ve determined a one-mean confidence interval would be helpful for an application, there are four steps to constructing the interval:

Prepare. Identify \(\bar x, s, n,\) and determine what confidence level you wish to use.

Check. Verify the conditions to ensure \(\bar x\) is nearly normal.

Calculate. If the conditions hold, compute, \(SE\), find \(t^*_{df}\) and construct the interval.

Conclude. Interpret the confidence interval in the context of the problem.

One sample \(t\)-tests

Is the typical US runner getting faster or slower over time? We consider this question in the context of the Cherry Blossom Race, which is a 10-mile race in Washington, DC each spring.

The average time for all runner who finished the Cherry Blossom Race in 2006 was 93.29 minutes (93 minutes and about 17 seconds). We want to determine using data from 100 participants in the 2017 Cherry Blossom Race whether runners in this race are getting faster or slower, versus the other possibility that there has been no change.

When completing a hypothesis test, for the one-sample mean, the process is nearly identical to completing a hypothesis test for a single proportion. First, we find the Z-score using the observed value, null value, and standard error; however, we call it a T-score since we use a \(t\)-distribution for calculating the tail area. Then we find the p-value using the same ideas we used previously: find the one-tail area under the sampling distribution, and double it.

A histogram of time for the sample Cherry Blossom Race data.

With both the independence and normality conditions satisfied, we can proceed with a hypothesis test using the \(t\)-distribution.

4.1.3 Solved Problem

Just like the normal distribution, we can use R to find the area to the below a certain standard deviation.

We can find the area to the left of 1.75 standard deviations with 12 degrees of freedom by using the following:

pt(1.75, df = 12)

## [1] 0.9471902

What proportion of the \(t\)-distribution with 18 degrees of freedom falls below -2.10?

To find the two tailed proportion with 2 degrees of freedom more than 3 units from the mean, we will do the following

2*pt(-3, df = 2, lower.tail = TRUE)

## [1] 0.09546597

pt(-3, df = 2, lower.tail = TRUE) + pt(3, df = 2, lower.tail = FALSE)

## [1] 0.09546597

pt(-3, df = 2, lower.tail = TRUE) + 1 -  pt(3, df = 2, lower.tail = TRUE)

## [1] 0.09546597

We can see that there are multiple ways to acquire the same results by modifying the arguments with the use of the complement rule.

4.1.4 Exercises

Sample mean \(\bar{x}\) is the unbiased point estimator for the population mean \(\mu\)
A value of \(\bar{x}\) is a point estimate
Error = \(\mu - \bar{x}\)

Central Limit Theorem (Sampling distribution of sample mean)

When taking samples of fixed size \(n\) from a population with mean \(\mu\) and standard deviation \(\sigma\), when the observations are independent (take random samples of fixed size \(n\), without replacement); the sample size \(n \geq 30\), then the sample proportion \(\bar{x}\) is approximately normal: \(\bar{x} \sim N(\mu, \frac{\sigma}{\sqrt{n}})\).

When we know the population is normal, no matter what sample size, \(\bar{x} \sim N(\mu, \frac{\sigma}{\sqrt{n}})\)

Notes:

When using \(\bar{x}\) to estimate \(\mu\) the Standard Error of \(\bar{x}\) is the standard deviation of its sampling distribution: \(S.E. = \frac{\sigma}{\sqrt{n}}\)
Usually \(\sigma\) is unknown use \(s\) to replace \(\sigma\): \(S.E. \approx \frac{s}{\sqrt{n}}\)

When can the CLT be applied

If \(n \geq 30\) and \(\sigma\) is known
If the population is normal and \(\sigma\) is known
Otherwise we use t-distribution: \(T = \frac{\bar{x} - \mu}{s/\sqrt{n}} \sim t_{df}\), where \(df=n-1\) is the degree of freedom.

t-distribution

Similar to the standard normal distribution: the probability density curve of a t-distribution is centered at 0, and it is bell-shaped. But tails of a t-distribution are thicker than that of the standard normal distribution; moreover, the lower \(df\), the thicker the tails.

Using R to find probability under t-distribution with \(df=n-1\):

For \(P(T < b)\): pt(b, df)
For \(P(T > a)\): pt(a, df, lower.tail = FALSE) or 1 - pt(a, df)
For \(P(a < T < b)\): pt(b, df) - pt(a, df)

To find the cut-off point \(t\) (critical value \(t^*\) or \(t_{\alpha/2}\)) for a given cumulative probability with \(df=n-1\):

Find \(t\) for \(P(T < t) = p\): qt(p, df)
Find \(t\) for \(P(T > t) = p\): qt(1 - p, df) or qt(p, df, lower.tail = FALSE)
\(t_{\alpha/2}\): \(P(T > t_{\alpha/2})\) = \(\alpha/2\): \(qt(\alpha/2, df, lower.tail = FALSE)\)

100\((1-\alpha)\%\) Confidence interval for mean \(\mu\)

Using sample with size \(n\), sample mean \(\bar{x}\), sample standard deviation \(s\), the critical value \(t_{\alpha/2}\): \(\bar{x} \pm t_{\alpha/2} \times \frac{s}{\sqrt{n}}\)

Margin of Error (M.E.)

\(M.E. = t_{\alpha/2} \times S.E. = t_{\alpha/2} \times \frac{s}{\sqrt{n}}\)