2.4 Normal Distribution

2.4.1 Objectives

By the end of this unit, students will be able to:

  • Understand the notion and characteristics of continuous probability distributions.
  • Use the normal distribution to model continuous random variables.

2.4.2 Overview

2.4.2.1 Introduction

The Normal distribution, also known as the Gaussian distribution, is the most widely used continuous probability distribution in statistics.
It describes many natural and social phenomena such as human height, IQ scores, and measurement errors, which tend to cluster symmetrically around a central mean.

A random variable \(X\) that follows a normal distribution with mean \(\mu\) and standard deviation \(\sigma\) is written as:

\[ X \sim N(\mu, \sigma) \]

Its probability density function (PDF) is given by:

\[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2} \]

This bell-shaped curve is symmetric about the mean \(\mu\), and values close to the mean occur more frequently than those farther away.


2.4.2.2 Characteristics of the Normal Distribution

For \(X \sim N(\mu, \sigma)\):

  1. The total area under the curve equals 1, representing total probability.
  2. The curve is symmetrical about the mean \(\mu\).
  3. The mean, median, and mode are equal.
  4. The standard deviation (\(\sigma\)) determines the spread or width of the curve:
    • Large \(\sigma\): flatter, wider curve.
    • Small \(\sigma\): narrower, taller curve.
  5. Probabilities are found by measuring the area under the curve, not by individual points (since \(P(X = k) = 0\)).

Insert Image: Diagram showing the symmetry and area of a normal curve with 50% on each side of the mean.


2.4.2.3 The Empirical Rule (68–95–99.7 Rule)

The Empirical Rule summarizes how data are distributed in a normal curve:

Range from Mean Approx. % of Data Interpretation
\(\mu \pm 1\sigma\) 68% About two-thirds of values fall within 1 standard deviation
\(\mu \pm 2\sigma\) 95% About 95% of values fall within 2 standard deviations
\(\mu \pm 3\sigma\) 99.7% Almost all values fall within 3 standard deviations

Insert Image: A normal curve shaded to show 68%, 95%, and 99.7% regions.


2.4.2.4 The Standard Normal Distribution and Z-Scores

To compare data from different normal distributions, we standardize observations using a z-score, which converts any normal variable into a standard normal distribution:

\[ Z = \frac{X - \mu}{\sigma} \] where \(Z \sim N(0, 1)\).

A z-score indicates how many standard deviations an observation \(X\) lies from the mean: - \(z > 0\): above the mean
- \(z < 0\): below the mean
- Large |z| values: far into the tails (unusual or rare observations)

Insert Image: Standard normal curve labeled with z-scores (-3 to +3).


2.4.2.5 Example: Comparing Test Scores

Suppose two students took different standardized exams: - SAT: \(\mu = 1050, \sigma = 200\) - ACT: \(\mu = 21, \sigma = 6\)

Maria scored 1350 on the SAT, and James scored 29 on the ACT.

Their standardized z-scores are:

\[ z_{Maria} = \frac{1350 - 1050}{200} = 1.58, \quad z_{James} = \frac{29 - 21}{6} = 1.76 \]

Because James’s z-score is higher, he performed better relative to his group.

Maria’s percentile corresponds to the shaded area in the distribution below:

’s percentile corresponds to the shaded area in the distribution below:

There are many ways to compute percentiles. Before the widespread availability of statistical software, people converted observed values to \(z\)-scores and then looked up the percentile in a table. Luckily R provides nice functionality for computing percentiles.

For these first few questions I’ll draw pictures for you, but you should be prepared to draw your own shortly.

Question 1: Remember that \(Z\sim N\left(\mu = 0, \sigma = 1\right)\).

Question 2: Find \(\mathbb{P}\left[Z > \right.\) \(\left.\right]\).

Question 3: Find \(\mathbb{P}\left[\right.\) \(< Z <\) \(\left.\right]\).

Through the last three problems you only worked with the standard normal distribution – that’s the \(Z\)-distribution, which is \(N\left(\mu = 0, \sigma = 1\right)\). We can find probabilities from arbitrary normal distributions (normal distributions with any mean and any standard deviation) using R’s ‘pnorm()’ functionality – just supply the appropriate ‘mean’ and ‘sd’ arguments to ‘pnorm()’ instead of the 0 and 1 that we passed earlier.

2.4.3 Knowledge Check

2.4.4 Solved Exercises

Exercise 1 For \(Z \sim N(0, 1)\) (the standard normal distribution, mean = 0, standard deviation = 1), use R to find the probability and sketch the region that represents the probability.

  1. \(P(Z < -1.2)\)
  2. \(P(Z > 2.1)\)
  3. \(P(-0.8 < Z < 1.5)\)
  4. \(P(|Z| < 1.8)\)
  5. \(P(Z > 0.75)\)

Solution

# (a)
pnorm(-1.2)
## [1] 0.1150697
# (b)
1 - pnorm(2.1)
## [1] 0.01786442
# (c)
pnorm(1.5) - pnorm(-0.8)
## [1] 0.7213374
# (d)
pnorm(1.8) - pnorm(-1.8)
## [1] 0.9281394
# (e)
1 - pnorm(0.75)
## [1] 0.2266274

Exercise 2 For \(X \sim N(4, 1.5)\) (a normal distribution with mean = 4 and standard deviation = 1.5), use R to find the probability and sketch the region that represents the probability.

  1. \(P(X < 3)\)
  2. \(P(X > 6)\)
  3. \(P(3 < X < 5.5)\)

Solution

# Given: mean = 4, sd = 1.5
# (1)
pnorm(3, mean = 4, sd = 1.5)
## [1] 0.2524925
# (2)
1 - pnorm(6, mean = 4, sd = 1.5)
## [1] 0.09121122
# (3)
pnorm(5.5, mean = 4, sd = 1.5) - pnorm(3, mean = 4, sd = 1.5)
## [1] 0.5888522

Exercise 3 For \(X \sim N(4, 1.5)\), compute the z-score of the given x-values:

  1. \(x = 3\)
  2. \(x = 4\)
  3. \(x = 5.5\)

Solution

mu <- 4
sigma <- 1.5
z1 <- (3 - mu) / sigma
z2 <- (4 - mu) / sigma
z3 <- (5.5 - mu) / sigma
z1; z2; z3
## [1] -0.6666667
## [1] 0
## [1] 1

Exercise 4

  1. State the Empirical Rule.

  2. Use R to verify the Empirical Rule: find \(P(|Z| < 1)\), \(P(|Z| < 2)\), and \(P(|Z| < 3)\).

Solution

(a) Empirical Rule:

About 68% of data lies within 1 standard deviation of the mean.

About 95% lies within 2 standard deviations.

About 99.7% lies within 3 standard deviations.

# (b)
p1 <- pnorm(1) - pnorm(-1)
p2 <- pnorm(2) - pnorm(-2)
p3 <- pnorm(3) - pnorm(-3)
p1; p2; p3
## [1] 0.6826895
## [1] 0.9544997
## [1] 0.9973002

Exercise 5

The weights of newborn babies follow a normal distribution with a mean of 3.2 kg and standard deviation of 0.5 kg. Find the probability that a randomly chosen baby weighs:

  1. Over 3.8 kg

  2. Less than 2.6 kg

  3. Between 2.8 kg and 3.6 kg

Solution

mu <- 3.2
sd <- 0.5
# (a)
1 - pnorm(3.8, mu, sd)
## [1] 0.1150697
# (b)
pnorm(2.6, mu, sd)
## [1] 0.1150697
# (c)
pnorm(3.6, mu, sd) - pnorm(2.8, mu, sd)
## [1] 0.5762892

Exercise 6 The weights of newborn babies follow a normal distribution with a mean of 3.2 kg and standard deviation of 0.5 kg.

  1. What is the cutoff weight for the lowest 25% of babies? (Round to 1 decimal place.)

  2. What is the cutoff weight for the highest 5% of babies? (Round to 1 decimal place.)

Solution

# (a)
qnorm(0.25, mean = 3.2, sd = 0.5)
## [1] 2.862755
# (b)
qnorm(0.95, mean = 3.2, sd = 0.5)
## [1] 4.022427

Exercise 7

The daily time (in hours) students spend studying follows a normal distribution with a mean of 5.4 hours and standard deviation of 1.1 hours.

Find the (standardized) z-score corresponding to a student who studies 4.2 hours.

Solution

x <- 4.2
mu <- 5.4
sigma <- 1.1
z <- (x - mu) / sigma
z
## [1] -1.090909

Exercise 8

Maria scored 76 on a statistics test with a mean of 70 and a standard deviation of 5. Liam scored 88 on a chemistry test with a mean of 82 and a standard deviation of 4. Find the z-scores for Maria’s and Liam’s test results and determine who performed better relative to their class.

Solution

z_maria <- (76 - 70) / 5
z_liam <- (88 - 82) / 4
z_maria; z_liam
## [1] 1.2
## [1] 1.5

Exercise 9

The score data of the verbal portion of the Graduate Record Examination (GRE) is approximately normally distributed with a mean of 475 points and a standard deviation of 108 points. Fill in the following blanks: approximately

(a) 68% of students who took the verbal portion of the GRE scored between _______ and ________

(b) 95% of students who took the verbal portion of the GRE scored between ______ and ________

(c) 99.7% of students who took the verbal portion of the GRE scored between ______ and ________

Solution

According to the Empirical Rule:

\[ \begin{aligned} 68\% &:\ \mu \pm 1\sigma \\ 95\% &:\ \mu \pm 2\sigma \\ 99.7\% &:\ \mu \pm 3\sigma \end{aligned} \]

Given:
\(\mu = 475,\ \sigma = 108\)

Compute each range:

mu <- 475
sigma <- 108

# (a) 68% within 1 standard deviation
low_68 <- mu - sigma
high_68 <- mu + sigma

# (b) 95% within 2 standard deviations
low_95 <- mu - 2 * sigma
high_95 <- mu + 2 * sigma

# (c) 99.7% within 3 standard deviations
low_997 <- mu - 3 * sigma
high_997 <- mu + 3 * sigma

low_68; high_68
## [1] 367
## [1] 583
low_95; high_95
## [1] 259
## [1] 691
low_997; high_997
## [1] 151
## [1] 799