Acknowledgement

These notes use content from OpenIntro Statistics Slides by

Mine Cetinkaya-Rundel.

Chapter Outline

  • 3.1 Defining Probability
  • 3.2 Conditional Probability
  • 3.3 Sampling from a Small Population
  • 3.4 Random Variables
  • 3.5 Continuous Distributions

3.1 Defining Probability

In this section, we introduce basic concepts and rules of probability, which is the foundation of statistical inference.

List of terms

  • Probability of equally likely outcomes
  • Event and probability of an event
  • Union and intersection of events; complement of an event
  • Mutually exclusive (or disjoint) events
  • General addition rule (and special case)
  • Rule of complements
  • Probability distribution
  • Independent events and multiplication rule
  • Finding probability from table

Introduction to Probability

Probability is a way to quantify (measure) uncertainty.

Example. If we toss a coin, what is the probability of getting Head?

  • If the coin is fair, then the chance of getting head is the same as the chance of getting Tail. Since there are two possible outcomes, the chance is 1-in-2, that is, 1/2 (Equally likely outcomes, theoretic)

  • If the coin is unfair, then we need to do many observations of tosses to get the probability of getting head, that is, use the proportion of times the Head would occur if we observed the random process an infinite number of times (frequentist interpretation).

Random processes

  • A random process is a situation in which we know what outcomes could happen, but we don’t know which particular outcome will happen.

  • A random process is the process of observing a random phenomena.

  • Examples:

  1. coin tosses (possible outcomes are either “Head” or “Tail”, but we do not know which one occurs for sure)

  2. die rolls (possible outcomes are “1”, “2”, “3”, “4”, “5”, “6”)

  3. the stock market next year tomorrow (possible outcomes are goes “up”, “down”)

Law of large numbers

Law of large numbers states that as more observations are collected, the proportion of occurrences with a particular outcome, \(\hat{p}_n\), converges to the probability of that outcome, p.Β 

Example. Roll a die. Let \(\hat{p}_n\) be the proportion of outcomes that are “1” after the first \(n\) rolls.

  • The figure below is a plot of \(\hat{p}_n\) vs number of rolls (\(n\)) for \(n\) between 1 and 10,000 rolls.
  • It shows that after 10,000 rolls, \(\hat{p}_n\) is very close to \(P(1)=1/6\approx 0.167\) (round to three decimals) which is the theoretical probability of rolling 1 on a fair die.

Law of large numbers (R Practice)

Example. Roll a die. Let \(\hat{p}_n\) be the proportion of outcomes that are “1” after the first \(n\) rolls.

  • Run the below code for different number of rolls (try max n = 10, 100, 1000, 10000). What do you notice?
n = 1:10
outcomes = c(1, 2, 3, 4, 5, 6)
p.1 = NULL
for(i in n){
  p.1[i] = mean(sample(outcomes, i, replace = T) == 1)
}
plot(n, p.1, type = "l", col = "blue", ylab = "Proportion of One")
abline(h = 1/6, col = "red")

Law of large numbers (R Practice)

Example. Roll a die. Let \(\hat{p}_n\) be the proportion of outcomes that are “1” after the first \(n\) rolls.

  • Fix max n = 1000 and run the code multiple times. What do you notice?
n = 1:1000
outcomes = c(1, 2, 3, 4, 5, 6)
p.1 = NULL
for(i in n){
  p.1[i] = mean(sample(outcomes, i, replace = T) == 1)
}
plot(n, p.1, type = "l", col = "blue", ylab = "Proportion of One")
abline(h = 1/6, col = "red")

Small number VS Large number run

With a small number of observations, outcomes of random phenomena may look quite different from what you expect.

Example. Toss a fair coin twice: You may see “HH”, or “TT”, may not be exactly as expected “HT” or “TH”

With a large number of observations, summary statistics settle down and get increasingly closer to particular numbers. As we make more observations, the proportion of times that a particular outcome occurs gets closer and closer to a certain number we would expect.

Example. Toss a fair coin 1 million times, you may see nearly half of the times showing head, nearly half of the times showing tail.

The long-run proportion provides the basis for the definition of probability.

Probability

For any random process, the probability of a particular outcome is the proportion of times that would occur in a long run of observations.

The sample space associated with an experiment (or a trial) is the set of \(\underline {\text{all possible outcomes}}.\)

Example.

  1. Flip a coin once, the sample space S= {H,T}
  2. Flip a coin twice, the sample space S= {HH,HT,TH,TT}
  3. Flip a coin three times, the sample space
    S={HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}
Notes:
  1. The list in sample space is not ordered, no repeating.
  2. One may use tree diagram to help organize.
  • Two basic probability rules:

    • \(0 \leq P(x_i) \leq 1\)
    • \(\sum P(x_i) = 1\) where the summation is taken over all possible outcomes in sample space.

Law of large numbers (continued)

When tossing a fair coin, it is expected that the H and T will be shown equally likely.

if heads comes up on each of the 10 tosses, what do you think the chance is that another head will come up on the next toss? 0.5, less than 0.5, or more than 0.5?

\[ {\underline{H} \underline{H} \underline{H} \underline{H} \underline{H} \underline{H} \underline{H} \underline{H} \underline{H} \underline{H} \underline{?}} \] - The probability is still 0.5, or there is still a 50% chance that another head will come up on the next toss.

\[ {P(H \text{ } on \text{ } 11^{th} \text{ } toss) = P(T \text{ } on \text{ } 11^{th} \text{ } toss) = 0.5}\] - The coin is not “due” for a tail.

  • The common misunderstanding of the LLN is that random processes are supposed to compensate for whatever happened in the past; this is just not true and is also called gambler’s fallacy (or law of averages).

Probability and Events

For a random phenomena, the probability of a particular outcome is the proportion of times that would occur in a long run of observations.

An event is a subset of the sample space. An event consists of a particular outcome (simple event) or a group of possible outcomes possessing designated feature.

We use capital letters for events.

Example. If a test consists of 3 questions of multiple choices. Students’ answer is either correct (C) or incorrect (I).

  1. What is the sample space?

  2. What is the event of getting all three questions correct?

  3. What is the event of getting at least two questions correct?

Sample space: {CCC, CCI, CIC, CII, ICC, ICI, IIC, III}.

Event A = student answers all 3 questions correctly = {CCC}

Event B = student passes (at least 2 correct) = {CCI, CIC, ICC, CCC}.

Finding Probability

For a sample space with equally likely outcomes, the probability of an outcome is the relative ratio.

Examples

  1. Flip a fair coin, the probability of getting a head or a tail is 1/2, or 0.5. \[𝑃(𝐻)=\frac{1}{2} \ \ \ \text{and} \ \ \ P(T)=\frac{1}{2}\]

  2. Roll a fair die, the probability of getting numbers 1, 2, 3, 4, 5, 6 are all \(\frac{1}6.\)

  3. If a teacher gives 1 quiz question of 4 multiple choices, only one is correct. Assume the student is not prepared, s/he selects the choice by guessing. What is the probability that student guessed correctly?

\[P(\text{student guessed correctly})=\frac{1}{4}\]

  1. Flip a coin twice, what is the probability of getting heads twice? As the sample space is {HH, HT, TH, TT}, each outcome is equally likely, \[P(HH) =\frac{1}{4}\]

Finding Probability

The probability of an event is the sum of probabilities of outcomes in the event.

Examples

  1. Roll a fair die, the sample space has six outcomes: S = {1, 2, 3, 4, 5, 6}. Let A be the event that the outcomes are even numbers, find P(A).

Solution. A = { 2, 4, 6}, then 𝑃(𝐴)=𝑃(2)+𝑃(4)+𝑃(6)=\(\frac{1}6 + \frac{1}6 + \frac{1}6 = \frac{3}6 =\frac{1}2\) Another way to think: the outcomes are equally likely, there are 3 outcomes in A, there are six outcomes in total, so \(𝑃(𝐴) =\frac{3}6 =\frac{1}2\) (relative ratio)

  1. Flip a fair coin twice, what is the probability of getting exactly on head? Solution. As the sample space is {HH, HT, TH, TT}, Let B be event of getting exactly one head ={ HT, TH}, then \(P(B)= \frac{1}2\)

  2. If a teacher gives 1 quiz question of 4 multiple choices, only one is correct. Assume the student is not prepared, s/he selects the choice by guessing.

What is the probability that student guessed wrong? Ans: \(\frac{3}4\)

Intersection, Union, and Disjoint Events

  • The intersection of A and B consists of outcomes that are in both \(\color{blue}{\text{A and B}}\). (Use \(\color{blue}{\text{A and B}}\) for the intersection, or use \(\color{blue}{\text{𝐴∩𝐡}}\).

  • The union of A and B consists of outcomes that are in A or in B or both. (use \(\color{blue}{\text{A or B}}\) for the union, or use \(\color{blue}{\text{𝐴βˆͺ𝐡}}\).

  • Two events A and B are disjoint if they do not have any common outcomes. (A and B) =βˆ… (empty set) or 𝐴∩𝐡=βˆ…

Two events A and B are non-disjoint if they have common outcome(s).

Addition Rule: Probability of the Union of Two Events

For the union of two events,
P(A or B) = P(A) + P(B) -P(A and B).

– general addition rule.

The probability of the Union, (with Outcomes in A or B or Both), add P(A) to P(B) and subtract P(A and B) to adjust for outcomes counted twice.

Question: When does P(A or B) = P(A) + P(B)?

Answer: if the events are disjoint, them P(A and B)= 0, P(A or B) = P(A) + P(B)– special addition rule.

Examples of non-disjoint events

Let A be the red cards, B be the Jacks, then A and B are non-disjoint events.

Union of non-disjoint events (use addition rule)

Example Find the P(red), P(Jack) and P(red and jack), P(red or jack)

Solution : There are a total of 52 cards. \[{P(jack \text{ } or \text{ } red) = P(jack) + P(red) - \color{red}{P(jack \text{ } and \text{ } red)}}\]

\[{= \frac{4}{52} + \frac{26}{52} - \frac{2}{52} = \frac{28}{52}}= \frac{7}{13}\approx 0.538461538\] Round to three decimals: 0.538; round to four decimals: 0.5385.

Example (addition rule)

Examples.

  1. Assume that in a meeting, 38 % are Biology majors, 68% are sophomores, 20% are sophomores and Biology majors. What is the percentage of meeting attendees who are either sophomores or Biology majors?

Solution.

\(\text{𝑃 (π΅π‘–π‘œπ‘™π‘œπ‘”π‘¦ π‘šπ‘Žπ‘—π‘œπ‘Ÿπ‘ )}=0.38 \hspace{0.4cm} \text{𝑃(π‘†π‘œπ‘β„Žπ‘œπ‘šπ‘œπ‘Ÿπ‘’π‘ )}=0.68,\hspace{0.4cm} \text{𝑃(𝐡 π‘Žπ‘›π‘‘ 𝑆)}=0.20\)

Use addition rule, \(𝑃(𝐡 π‘œπ‘Ÿ 𝑆)=𝑃(𝐡) +𝑃(S) – 𝑃(𝐡 π‘Žπ‘›π‘‘ 𝑆)= 0.38+0.68βˆ’0.2=0.86\),

86% of meeting attendees who are either sophomores or Biology majors.

  1. If known 𝑃(𝐴)=0.25, 𝑃(𝐡)=0.64, 𝑃 (𝐴 π‘œπ‘Ÿ 𝐡)=0.72, find 𝑃 (𝐴 π‘Žπ‘›π‘‘ 𝐡).

Solution.

From addition rule, \((𝐴 π‘œπ‘Ÿ 𝐡) = 𝑃(𝐴) + 𝑃(𝐡) – \color{blue}{\text{𝑃(𝐴 π‘Žπ‘›π‘‘ 𝐡)}}\),

\(\color{blue}{\text{𝑃(𝐴 π‘Žπ‘›π‘‘ 𝐡)}} = 𝑃(𝐴) + 𝑃(𝐡) – 𝑃(𝐴 π‘œπ‘Ÿ 𝐡)\),

                                    = 0.25+0.64 –0.72=0.17

Note: Students can make other similar questions, using addition rule.

Complementary events, Rule of Complement.

The complement of event A, denoted as \(A^c\), (or \(\bar{A}\)) is the event of all outcomes that are not in A.

A and \(A^c\) are mutually exclusive (disjoint): 𝑃(𝐴)+𝑃(\(A^c\))=1

This implies

\[𝑃(A^c)=1βˆ’π‘ƒ(𝐴) \ \text{and} \ 𝑃(𝐴)=1βˆ’π‘ƒ(A^c)\]

Examples

  1. If the sample space is 𝑆 = {HH, HT, TH, TT}, event 𝐴={HH}, then \(A^c\)={HT, TH, TT}.

  2. If 𝑃(𝐴)=0.81, then 𝑃(\(A^c\))=0.19

Probability from Frequency Table

Example. Below is a frequency table of class rank from a survey of 406 students. If one student is randomly selected, what is the probability that the student is:

  1. Freshman

  2. not a Freshman

  3. a Freshman or a senior.

Answers (round to 3 decimals)

  1. 0.362

  2. 0.638

  3. 0.522

\[ \begin{align*} &\color{blue}{\text{class rank}} \\ \hline \color{green}{\text{class}} && \color{red}{\text{Frequency}} \\ \hline \text{Freshman}&& 147 \\ \text{Sophomore} && 96 \\ \text{Junior} && 98 \\ \text{Senior} && 65 \\ \hline \text{Total} && 406 \end{align*} \]

Probability Distributions

Independence

Two processes are independent if knowing the outcome of one provides no useful information about the outcome of the other.

Examples

  • Knowing that the coin landed on a head on the first toss does not provide any useful information for determining what the coin will land on in the second toss. β†’
    Outcomes of two tosses of a coin are independent.
  • Knowing that the first card drawn from a deck is an ace does provide useful information for determining the probability of drawing an ace in the second draw. β†’
    Outcomes of two draws from a deck of cards (without replacement) are dependent.

Product rule for independent events

For independent events A, B, the product rule holds: P(A and B) = P(A) Γ— P(B) Or more generally,\({𝑃(A_1 \hspace{0.2cm}π‘Žπ‘›π‘‘ \hspace{0.2cm} … \hspace{0.2cm}π‘Žπ‘›π‘‘ \hspace{0.2cm} A_k) = \hspace{0.2cm}𝑃 (𝐴_1)\hspace{0.2cm} ×…× \hspace{0.2cm}𝑃 (A_k)}\)

Examples

  1. If you toss an unfair coin twice, what is the probability of getting 2 heads? Assume the coin has 60% of chance showing head.

  2. If a quiz contains 3 questions, each has a multiple choice of 4, only one is correct, what is the probability that a student gets all questions correctly by guessing?

  3. If a quiz contains 3 questions, #1 has a multiple choice of 4, #2 is a T/F, #3 has a multiple choice of 6, what is the probability that a student gets all questions correctly by guessing?

Answer

  1. \(0.6^2\)=0.36

  2. \((\frac{1}{4})^3= \frac{1}{64}\)=0.015625

  3. \((\frac{1}{4})(\frac{1}{2})(\frac{1}{6})\)=0.02083

Practice

A recent Gallup poll suggests that 25.5% of Texans do not have health insurance as of June 2012. Assuming that the uninsured rate stayed constant, what is the probability that two randomly selected Texans are both
uninsured?
  1. \(25.5^2\)
  2. \(0.255^2\)
  3. \(0.255 \times 2\)
  4. \((1-0.255)^2\)

Practice

A recent Gallup poll suggests that 25.5% of Texans do not have health insurance as of June 2012. Assuming that the uninsured rate stayed constant, what is the probability that two randomly selected Texans are both
uninsured?
  1. \(25.5^2\)
  2. \(\color{red}{0.255^2}\)
  3. \(0.255 \times 2\)
  4. \((1-0.255)^2\)

Practice

A recent Gallup poll suggests that 25.5% of Texans do not have health insurance as of June 2012. Assuming that the uninsured rate stayed constant, what is the probability that two randomly selected Texans are both
insured?
  1. \(25.5^2\)
  2. \(0.255^2\)
  3. \(0.255 \times 2\)
  4. \((1-0.255)^2\)

Example

Roughly 20% of undergraduates at a university are vegetarian or vegan. What is the probability that, among a random sample of 3 undergraduates, at least one is vegetarian or vegan?

Solution We break down to the following parts to consider:

  1. {At least 1 is vegetarian or vegan} is complement of {none veg}
  2. For all 3 in the sample, weather s/he is veg is independent
  3. 80% of UG are not vegetarian
So, using the rule of complement, product rule for independent events
  1. \(1-0.2 \times 3\)
  2. \(1-0.2^3\)
  3. \(0.8^3\)
  4. \(1-0.8 \times 3\)
  5. \(\color{red}{1-0.8^3}\)

\[\text{P(at least 1 from veg)} \\= 1 - \text{P(none veg)} \\= 1 - (1 - 0.2)^3 \\= 1 - 0.8^3 \\= 1 - 0.512 \\= 0.488\]

Probability from a contigency table

Example: The following contingency table is a sample of 100 people (gender and dominant hand use). If randomly select a person from the surveyed group, use the table to find the probability that

  1. The person is right-handed
  2. The person is male
  3. The person is a male and right-handed
  4. The person is male if we know the person is right-handed
  5. The person is right-handed if we know the person is male

Ans:

  1. 0.87 (marginal probability)

  2. 0.52 (marginal probability)

  3. 0.43 (joint probability)

  4. 0.49 (conditional probability)

  5. 0.83 (conditional probability)

Discussions (end of 3.1)

Disjoint vs.Β Complementary; Disjoint vs.Β Independent

Do the sum of probabilities of two disjoint events always add up to 1?

Not necessarily, there may be more than 2 events in the sample space, e.g.Β Party affiliation.

Do the sum of probabilities of two complementary events always add up to 1?

Yes, that’s the definition of complementary, e.g.Β Heads and Tails. Complements form special disjoint events.

Examples.
1) If 𝑃(𝐴)=0.3, 𝑃(𝐡)=0.6, events 𝐴 and 𝐡 are independent, find 𝑃(𝐴 π‘œπ‘Ÿ 𝐡)
2) If 𝑃(𝐴)=0.3, 𝑃(𝐡)=0.6, events 𝐴 and 𝐡 are disjoint, find 𝑃(𝐴 π‘œπ‘Ÿ 𝐡)

Solution.
1) By product rule, \(𝑃(𝐴 π‘Žπ‘›π‘‘ 𝐡)=P(A)P(B)=0.18\)
then by addition rule, \(𝑃(𝐴 π‘œπ‘Ÿ 𝐡)= P(A)+P(B)βˆ’P(A and B)\)
=0.3+0.6βˆ’0.18=0.72
2) 𝑃(𝐴 π‘Žπ‘›π‘‘ 𝐡)=0, 𝑃(𝐴 π‘œπ‘Ÿ 𝐡)=P(A)+P(B)=0.9

3.2 Conditional Probability

In this section, we discuss whether the probability of an event is related to (affected by) another event that has taken place.

The probability of A given that B has occurred is called the conditional probability of A given B is denoted by 𝑃(𝐴|𝐡)

  • Marginal probability
  • Joint probability
  • Conditional Probability
  • General multiplication rule
  • Independence and conditional probability
  • Bayer’s Theorem

Example – Relapse

Researchers randomly assigned 72 chronic users of cocaine into three groups: desipramine (antidepressant), lithium (standard treatment for cocaine) and placebo. Results of the study are summarized below.

\[ \begin{align} && \text{relapse} && \text{no relapse} &&\text{total} \\ \hline \text{desipramine} && 10 && 14 && 24 \\ \text{lithium} && 18 && 6 && 24 \\ \text{placebo} && 20 && 4 && 24 \\ \hline \text{total} && 48 && 24 && 72 \end{align} \]

Marginal probability

What is the probability that a patient relapsed?

\[ \begin{align} && \text{relapse} && \text{no relapse} && \text{total} \\ \hline \text{desipramine} && 10 && 14 && \color{green}{24} \\ \text{lithium} && 18 && 6 && 24 \\ \text{placebo} && 20 && 4 && 24 \\ \hline \text{total} && \color{red}{48} && 24 && \color{purple}{72} \end{align} \]

\(P(relapsed) = \frac{\color{red}{48}}{\color{purple}{72}} = \frac{2}{3} \approx 0.67\)

What is the probability that a patient received the desipramine)?

\(\text{P(desipramine)} = \frac{\color{green}{24}}{\color{purple}{72}}= \frac{1}{3} \approx 0.33\)

These are called Marginal Probability.

Joint probability

What is the probability that a patient received the antidepressant(desipramine) \(\underline{\color{red}{and}}\) relapsed?

\[ \begin{align} && \text{relapse} && \text{no relapse} && \text{total} \\ \hline \text{desipramine} && \color{red}{10} && 14 && 24 \\ \text{lithium} && 18 && 6 && 24 \\ \text{placebo} && 20 && 4 && 24 \\ \hline \text{total} && 48 && 24 && \color{red}{72} \end{align} \] P(relapsed and desipramine) = \(\frac{10}{72}=\frac{5}{36}\approx 0.14\) (joint probability)

Review on Addition Rule: find P (relapsed or desipramine) P (relapsed or desipramine)

= P (relapsed) + P (desipramine) - P(relapsed and desipramine)

\(\approx 0.67+0.33 βˆ’0.14=0.86 (or \frac{(48+24βˆ’10)}{72}=\frac{62}{72}\approx 0.86 )\)

Marginal and Joint Probability

Divide each entry by the grand total, we have a table showing joint probability and marginal probability.

\[ \begin{align} && \text{relapse} && \text{no relapse} && \text{total} \\ \hline \text{desipramine} && 10 && 14 && 24 \\ \text{lithium} && 18 && 6 && 24 \\ \text{placebo} && 20 && 4 && 24 \\ \hline \text{total} && 48 && 24 && 72 \end{align} \]

\[ \begin{align} && \text{relapse} && \text{no relapse} && \text{total} \\ \hline \text{desipramine} && \frac{10}{72} \approx 0.14 && \frac{14}{72}\approx 0.19 && \frac{24}{72}\approx0.33 \\ \text{lithium} && \frac{18}{72} \approx0.25 && \frac{6}{72}\approx 0.33 && \frac{24}{72}\approx0.33 \\ \text{placebo} && \frac{20}{72}\approx 0.28 && \frac{4}{72}\approx 0.06 && \frac{24}{72}\approx0.33 \\ \hline \text{total} && \frac{48}{72}\approx 0.67 && \frac{24}{72} \approx 0.33 && \frac{72}{72} \approx 1 \end{align} \] \[ \begin{align} && \text{relapse} && \text{no relapse} && \text{total} \\ \hline \text{desipramine} && 0.14 && 0.19 && 0.33 \\ \text{lithium} && 0.25 && 0.08 && 0.33\\ \text{placebo} && 0.28 && 0.06 && 0.33 \\ \hline \text{total} && 0.67 && 0.33 && 1 \end{align} \]

Conditonal probability

The conditional probability of the outcome of interest A given condition B is calculated as

\[{P(A|B) = \frac{P(A \text{ and }B)}{P(B)}}\]

Example : If we know that a patient received the antidepressant (desipramine), what is the probability that they relapsed?

\[ \begin{align} && relapse && no relapse && total \\ \hline desipramine && 10 && 14 && 24 \\ lithium && 18 && 6 && 24 \\ placebo && 20 && 4 && 24 \\ \hline total && 48 && 24 && 72 \end{align} \]

\[ \begin{eqnarray*} P(relapse|desipramine) &=& \frac{P(relapse \text{ }and \text{ }desipramine)}{P(desipramine)} \\ &=& \frac{10/72}{24/72} \\ &=& \frac{10}{24} \\ &=& 0.42 \end{eqnarray*} \]

Conditional probability

Another way to understand/compute P(A|B) is the portion of A in B. For contingency table,

\[P(A|B) = \frac{\text{count of (A and B)}}{\text{the count of B}}\]

In this example,

\[ \begin{align} && relapse && no relapse && total \\ \hline desipramine && \color{red}{10} && 14 && 24 \\ lithium && 18 && 6 && 24 \\ placebo && 20 && 4 && 24 \\ \hline total && \color{red}{48} && 24 && 72 \end{align} \]

\[P(relapse|desipramine)= \frac{\text{count of (relapse and desipramine)}}{\text{the count of desipramine}}= \frac{10}{24}= \frac{5}{12} = 0.42\]

Question. If we know that a patient relapsed, what is the probability that they received the antidepressant (desipramine)?

\(P(desipramine | relapse) = \frac{10}{48} \approx 0.21\)

Conditional probability

\[ \begin{align} && relapse && no relapse && total \\ \hline desipramine && 10 && 14 && 24 \\ lithium && 18 && 6 && 24 \\ placebo && 20 && 4 && 24 \\ \hline total && 48 && 24 && 72 \end{align} \]

1) If we know that a patient received the lithium, what is the probability that they relapsed?
2) If we know that a patient received placebo, what is the probability that they relapsed?
3) If we know that a patient relapsed, what is the probability that they received the lithium?
4)If we know that a patient relapsed, what is the probability that they received placebo?

Answers:

- \(𝑃 ( π‘Ÿπ‘’π‘™π‘Žπ‘π‘ π‘’π‘‘ ─| π‘™π‘–π‘‘β„Žπ‘–π‘’π‘š )=18/24β‰ˆ0.75\)
- \(P( π‘Ÿπ‘’π‘™π‘Žπ‘π‘ π‘’π‘‘ ─| π‘π‘™π‘Žπ‘π‘’π‘π‘œ )=20/24β‰ˆ0.83\)
- \(𝑃(π‘™π‘–π‘‘β„Žπ‘–π‘’π‘š ─| π‘Ÿπ‘’π‘™π‘Žπ‘π‘ π‘’π‘‘ )=18/48β‰ˆ0.38\)
- \(𝑃( π‘π‘™π‘Žπ‘π‘’π‘π‘œβ”€| π‘Ÿπ‘’π‘™π‘Žπ‘π‘ π‘’π‘‘ )=20/48β‰ˆ0.42\)

General multiplication rule

  • Earlier we saw that if two events are independent, their joint probability is simply the product of their probabilities. \({\text{P(A and B)} = P(A) \times P(B)}\)

  • If the events are not believed to be independent, the joint probability is calculated slightly differently.

  • If A and B represent two outcomes or events, then

\({\text{P(A and B)} = P(A | B) \times P(B)}\hspace{0.2cm} or \hspace{0.2cm} {\text{P(A and B)} = P(B|A) \times P(A)}\)

  • Note: that this formula is simply the conditional probability formula, rearranged.

  • It is useful to think of A as the outcome of interest and B as the condition.

We can generalize the general multiplication rule as \(P(A_1 and A_2) = P(A_1)P(A_2|A_1),\)

\(P(A_1 and A_2 and A_3) = P(A_1 and A_2)P(A_3| A_1 and A_2)= P(A_1)P(A_2|A_1)P(A_3|A_1,A_2)\)

\(P(A_1 and ... A_n)= P(A_1)P(A_2|A_1)P(A_3|A_1 and A_2)..... P(A_n|A_1 and ... A_{n-1})\)

Example

In a bag, there are 10 marbles: 2 red, 3 blue, 5 white. If we pick up 3 marbles without replacement, what is the probability that the 1st is red, the 2nd is blue, and the 3rd is white?

Solution. \(𝑃(π‘…π΅π‘Š)=𝑃(𝑅)𝑃(𝐡│𝑅)𝑃(π‘Š|𝑅𝐡)=\frac{2}{10} (\frac{3}{9})(\frac{5}{8})= \frac{1}{24}\approx 0.042\)

Practice: Find

  1. P(WWR)
  2. P(RBWB)

Independence and conditional probabilities

Consider the following (hypothetical) distribution of gender and major of students in an introductory statistics class:

\[ \begin{align} && social science && non-social science && total \\ \hline female && 30 && 20 && 50 \\ male && 30 && 20 && 50 \\ \hline total && 60 && 40 && 100 \end{align} \]

  • The probability that a randomly selected student is a social science major is \(\frac{60}{100} = 0.6.\)

  • The probability that a randomly selected student is a social science major given that they are female is \(\frac{30}{50} = 0.6.\)

  • Since P(SS | M) = 0.6, major of students in this class does not depend on their gender: P(SS | F) = P(SS).

Independence and conditional probabilities

Generically, if P(A | B) = P(A) then the events A and B are said to be independent.

  • Conceptually: Giving B doesn’t tell us anything about A.

  • Mathematically: We know that if events A and B are independent, P(A and B) = P(A) \(\times\) P(B). Then,

\[{P(A|B) = \frac{P(A \text{ }and \text{ }B)}{P(B)} = \frac{P(A) \times P(B)}{P(B)} = P(A)}\] So, there are two ways to determine/use the independence of two events:

  1. P(A and B) = P(A) \(\times\) P(B) or
  2. P(A|B) = P(A) (or P(B|A) = P(B))

Example

Between January 9-12, 2013, SurveyUSA interviewed a random sample of 500 NC residents asking them whether they think widespread gun ownership protects law abiding citizens from crime, or makes society more dangerous. 58% of all respondents said it protects citizens. 67% of White respondents, 28% of Black respondents, and 64% of Hispanic respondents shared this view.

Claim: Opinion on gun and race ethnicity are Dependent.

Check if P(A|B) = P(A).

  • P(randomly selected NC resident says gun ownership protects citizens ) = 0.58

  • P(protects citizens | White) = 0.67 β‰ 0.58

  • P(protects citizens | Black) = 0.28 β‰ 0.58

  • P(protects citizens | Hispanic) = 0.64β‰  0.58

  • P(protects citizens) varies by race/ethnicity, therefore opinion on gun ownership and race ethnicity are most likely dependent (using the sample).

Review Basic Rules of Probability

  • \(P(A) + P(A^C)=1\)
  • \(P(A or B) = P(A) +P(B) -P(A and B)\)

  • if the events are disjoint, then

P(A and B)=0, P(A or B)= P(A)+P(B).

  • if A,B are independent, then all below three are true

\(𝑃(𝐴 π‘Žπ‘›π‘‘ 𝐡)=𝑃(𝐴)×𝑃(𝐡)\)

\(𝑃(𝐴│𝐡)=𝑃(𝐴), 𝑃(𝐡│𝐴)=𝑃(𝐡)\)

  • Note: \((A or B)^c = (A^cand B^c)\) (the green part in the lower right diagram)

Bayes Theorem

  • Bayes Theorem gives inverting probabilities: express 𝑃\((𝐴│𝐡)\) in terms of \(𝑃(𝐡│𝐴)\) and \(𝑃(𝐴)\).
  • We first discuss rule of total probability (of \(𝑃(𝐡)\) ) \(𝑃(𝐡)=𝑃(𝐡 and 𝐴)+𝑃(𝐡 and A^c)\)

Notice that \(A\) and \(A^c\) are a partition of sample space S:

  1. 𝐴 and \(A^c\) are disjoint (mutually exclusive) : 𝐴∩ \(A^c=\phi\)

  2. their union is the whole space: \(𝐴βˆͺA^c=𝑆\)

  • Combining the conditional probability formula, the general multiplication rule, and the rule of total probability, we have the Bayes Theorem \[𝑃(𝐴│𝐡)= \frac{𝑃(𝐡│𝐴)𝑃(𝐴)}{𝑃(𝐡│𝐴)𝑃(𝐴)+𝑃(𝐡│𝐴^𝑐)𝑃(𝐴^𝑐)},\] where usually \(A\) is an event of one variable, and \(B\) is an event of another variable.

The formula used the rule of total probability and \[𝑃(𝐴│𝐡)= 𝑃(𝐴\ and \ 𝐡)/𝑃(𝐡), 𝑃(𝐴\ π‘Žπ‘›π‘‘\ 𝐡)= 𝑃(𝐡│𝐴)𝑃(𝐴), 𝑃( A^c\ π‘Žπ‘›π‘‘\ 𝐡)= 𝑃(𝐡│ A^c)𝑃( A^c )\]

Example

In a statistics class, 80% of students can construct box plots, of those who can construct box plots, 86% passed, while only 65% of those students who cannot construct box plots passed. What is the probability that a student can construct box plots if it is known that the student passed?

Solution.

Let A ={ students can construct box plots} and B ={ students passed}, then \[𝑃(𝐴)=0.80, 𝑃(𝐡│𝐴)=0.86, 𝑃(𝐡│𝐴^𝑐 )=0.65\]

To find 𝑃(𝐴|B), use Bayes’ Theorem \[ \begin{eqnarray*}𝑃(𝐴│𝐡)&=& \frac{𝑃(𝐡│𝐴)𝑃(𝐴)}{𝑃(𝐡│𝐴)𝑃(𝐴)+𝑃(𝐡│𝐴^𝑐 )𝑃(𝐴^𝑐)}\\ &=&\frac{(0.86Γ—0.80)}{(0.86Γ—0.80+0.65Γ—0.20)}=\frac{0.688}{(0.688+0.13)} = \frac{0.688}{0.818}\approx 0.84 \end{eqnarray*} \]

Answer: Of those students who passed, 84% can draw box plots.

We can also draw contingency table to explain – next slide

Example

Draw contingency table to explain A ={ students can construct box plots} B ={ students passed}, then 𝑃(𝐴)=0.80, 𝑃(𝐡│𝐴)=0.86, \((𝐡│𝐴^𝑐)\)=0.65 To find 𝑃(𝐴|B). \[ \begin{eqnarray*} && A (\text{can draw Box Plots)} && A^c(\text{cannot draw Box plots}) && \\ \hline B (\text{passed}) && \color{blue}{0.86\times0.80= 0.688}&& \color{purple}{0.65\times0.20=0.13} && \color{red}{0.688+0.13=0.818}\\ \ B^c(\text{not passed}) && \color{purple}{0.80-0.688=0.112}&& \color{red}{0.20-0.13=0.07} && fill \\ \ && 0.80 && 0.20 && 1 \end{eqnarray*} \]

\(𝑃(\text{π‘π‘Žπ‘› π‘‘π‘Ÿπ‘Žπ‘€ π‘π‘œπ‘₯ π‘π‘™π‘œπ‘‘π‘  β”‚π‘π‘Žπ‘ π‘ π‘’π‘‘)}=𝑃(𝐴│𝐡)= \frac{0.688}{0.818} \approx 0.84\)

Use Tree Diagram

Let A ={ students can construct box plots} and B ={ students passed}, then

\[P(𝐴)=0.80, 𝑃(𝐡│𝐴)=0.86, 𝑃(𝐡│𝐴^𝑐 )=0.65\]

\[𝑃(\text{π‘π‘Žπ‘› π‘‘π‘Ÿπ‘Žπ‘€ π‘π‘œπ‘₯ π‘π‘™π‘œπ‘‘π‘  β”‚π‘π‘Žπ‘ π‘ π‘’π‘‘)}=𝑃(𝐴│𝐡)= \frac{0.688}{0.818} \approx 0.84\]

Bayes Theorem

For a variable with events \(𝐴_1, 𝐴_2, … , 𝐴_π‘˜\). If they are mutually disjoint \((𝐴_π‘–βˆ©π΄_𝑗=\alpha\) for all \(𝑖≠𝑗)\), and their union forms the whole sample space \(S = (𝐴_1βˆͺ𝐴_2βˆͺ…βˆͺ𝐴_π‘˜)\), then for an event B of another variable,

\[𝑷(𝑨_π’Šβ”‚π‘©)= \frac{𝑷(𝑩│𝑨_π’Š )𝑷(𝑨_π’Š)}{𝑷(𝑩│𝑨_𝟏 )𝑷(𝑨_𝟏 )+𝑷(𝑩│𝑨_𝟐 )𝑷(𝑨_𝟐 )+…+𝑷(𝑩│𝑨_π’Œ )𝑷(𝑨_π’Œ)}\]

Note : This is really \[𝑷(𝑨_π’Šβ”‚π‘©)= \frac{𝑷(𝑩 𝒂𝒏𝒅 𝑨_π’Š )}{𝑷(𝑩)}\]

Example

Joe visits campus every Thursday evening. However, some days the parking garage is full, often due to college events. There are academic events on 35% of evenings, sporting events on 20% of evenings, and no events on 45% of evenings. When there is an academic event, the garage fills up about 25% of the time, and it fills up 70% of evenings with sporting events. On evenings when there are no events, it only fills up about 5% of the time. If Jose comes to campus and finds the garage full, what is the probability that there is a sporting event?

The outcome of interest is there is a sporting event (call this 𝐴1), and the condition is that the lot is full (call this as 𝐡). Let 𝐴2 represent an academic event and 𝐴3 represent there being no event on campus. Then the given probabilities can be written as:

\[P(A_1) = 0.20 \hspace{1.5cm} P(A_2) = 0.35\hspace{1.5cm} P(A_3)= 0.45\] \[P(B|A_1) =0.70\hspace{1.5cm} P(B|A_2) =0.25\hspace{1.5cm} P(B|A_3) =0.05\]

Example (end of 3.2)

\(P(A_1) = 0.2 \hspace{1.5cm}P(A_2) = 0.35\hspace{1.5cm}P(A_3)= 0.45\)
\(P(B|A_1) =0.7\hspace{1.5cm}P(B|A_2) =0.25\hspace{1.5cm}P(B|A_3) =0.05\)

Bayes’ Theorem can be used to compute the probability of a sporting event (\(𝐴_1\)) under the condition that the parking lot is full (𝐡):

\(𝑃(𝐴_1│𝐡)= \frac{𝑃(𝐡│𝐴_1 )𝑃(𝐴_1)}{𝑃(𝐡│𝐴_1 )𝑃(𝐴_1 )+𝑃(𝐡│𝐴_2 )𝑃(𝐴_2 )+𝑃(𝐡│𝐴_3 )𝑃(𝐴_3)}\)

\(=\frac{(0.7Γ—0.2)}{(0.7Γ—0.2+0.25Γ—0.35+0.05Γ—0.45)}=0.56\)

Based on the information that the garage is full, there is a 56% chance that a sporting event is being held on campus that evening.

3.4 Random Variables

List of topics

  • Random variable
  • Expected value (of discrete R. V.)
  • Variance, and standard deviation (of discrete R. V.)

Random variables

A random variable is a numerical measurement of the outcome of a random phenomenon. A random variable assumes numerical values associated with the random outcomes of an experiment.

  • We use a capital letter, like 𝑋, to denote a random variable.

  • The values of a random variable are denoted with a lowercase letter, in this case π‘₯.

  • There are two types of random variables:

    • Discrete random variables often take integer values (countable). Example. Flip a coin three times. X = number of heads in the 3 flips; X denotes a random variable. π‘₯=0, 1, 2, 3 are possible values of the random variable.

    • Continuous random variables take values in real numbers of an interval.

Example. time, age, and size measures such as height and weight.

Expected Value (mean)

  • We are often interested in the average outcome of a random variable.

  • We call this the expected value (mean), and it is a weighted average of the possible outcomes

  • For discrete random variables, say X, the expected value is defined/calculated by πœ‡=𝐸(𝑋)= \(\sum{x_ip(x_i)}\)

    where the summation is taken over all possible values \(π‘₯_𝑖\).

Expected value of a discrete random variable

In a game of cards you win $1 if you draw a heart, $5 if you draw an ace (including the ace of hearts), $10 if you draw the king of spades and nothing for any other card you draw. Write the probability model for your winnings, and calculate your expected winning. Solution. Let X be the dollar amount of making (discrete r.v), then \(E(X)= \sum{x_ip(x_i)}=1(\frac{12}{52}) + 5(\frac{4}{52}) + 10(\frac{1}{52})+ 0(\frac{35}{52})=(\frac{42}{52}) \approx 0.81\)

\[ \begin{align} Event && X && P(X) && X \cdot P(X) \\ \hline Heart (not ace) && 1 && \frac{12}{52}&& \frac{12}{52} \\ Ace && 5 && \frac{4}{52} && \frac{20}{52} \\ King of spades && 10 && \frac{1}{52} && \frac{10}{52} \\ All else && 0 && \frac{35}{52} && 0 \\ \hline Total && && && E(X) = \frac{42}{52} \approx 0.81 \end{align} \]

Expected value of a discrete random variable

Below is a visual representation of the probability distribution of winnings from this game:

Compute the expected value:

\[E(X) = 0.23 +0.40 +0.2 = 0.83\]

Variability , Standard Deviation

We are also often interested in the variability in the values of a random variable.

The variance
\[{\sigma^2 = Var(X) = \sum_{i=1}^{k}(x_i-πœ‡)^2P(X=x_i)}\]
The Standard Deviation
\[{\sigma = SD(X) = \sqrt{Var(X)}}= \sqrt{\sum(x_i-πœ‡)^2P(X=x_i)}\]

Variability of a discrete random variable

For the previous card game example, how much would you expect the winnings to vary from game to game?

\[ \begin{align} X && P(X) && X \cdot P(X) && (X - E(X))^2 && P(X) \cdot (X - E(X))^2\\ \hline 1 && \frac{12}{52} && 1 \times \frac{12}{52} = \frac{12}{52} && (1-0.81)^2 = 0.0361 && \frac{12}{52} \times 0.0361 = 0.0083 \\ \hline 5 && \frac{4}{52} && 5 \times \frac{4}{52} = \frac{20}{52} && (5-0.81)^2 = 17.5561 && \frac{4}{52} \times 17.5561 = 1.3505 \\ \hline 10 && \frac{1}{52} && 10 \times \frac{1}{52} = \frac{10}{52} && (10-0.81)^2 = 84.4561 && \frac{1}{52} \times 84.4561 = 1.6242 \\ \hline 0 && \frac{35}{52} && 0 \times \frac{35}{52} = 0 && (0-0.81)^2 = 0.6561 && \frac{35}{52} \times 0.6561 = 0.4416 \\ \hline && && E(X) = 0.81 \end{align} \]

Variability of a discrete random variable

For the previous card game example, how much would you expect the winnings to vary from game to game?

\[ \begin{align} X && P(X) && X \cdot P(X) && (X - E(X))^2 && P(X) \cdot (X - E(X))^2\\ \hline 1 && \frac{12}{52} && 1 \times \frac{12}{52} = \frac{12}{52} && (1-0.81)^2 = 0.0361 && \frac{12}{52} \times 0.0361 = 0.0083 \\ \hline 5 && \frac{4}{52} && 5 \times \frac{4}{52} = \frac{20}{52} && (5-0.81)^2 = 17.5561 && \frac{4}{52} \times 17.5561 = 1.3505 \\ \hline 10 && \frac{1}{52} && 10 \times \frac{1}{52} = \frac{10}{52} && (10-0.81)^2 = 84.4561 && \frac{1}{52} \times 84.4561 = 1.6242 \\ \hline 0 && \frac{35}{52} && 0 \times \frac{35}{52} = 0 && (0-0.81)^2 = 0.6561 && \frac{35}{52} \times 0.6561 = 0.4416 \\ \hline && && E(X) = 0.81 && && {\\V(X)= 3.4246} \end{align} \]

Variability of a discrete random variable

For the previous card game example, how much would you expect the winnings to vary from game to game?

\[ \begin{align} X && P(X) && X \cdot P(X) && (X - E(X))^2 && P(X) \cdot (X - E(X))^2\\ \hline 1 && \frac{12}{52} && 1 \times \frac{12}{52} = \frac{12}{52} && (1-0.81)^2 = 0.0361 && \frac{12}{52} \times 0.0361 = 0.0083 \\ \hline 5 && \frac{4}{52} && 5 \times \frac{4}{52} = \frac{20}{52} && (5-0.81)^2 = 17.5561 && \frac{4}{52} \times 17.5561 = 1.3505 \\ \hline 10 && \frac{1}{52} && 10 \times \frac{1}{52} = \frac{10}{52} && (10-0.81)^2 = 84.4561 && \frac{1}{52} \times 84.4561 = 1.6242 \\ \hline 0 && \frac{35}{52} && 0 \times \frac{35}{52} = 0 && (0-0.81)^2 = 0.6561 && \frac{35}{52} \times 0.6561 = 0.4416 \\ \hline && && E(X) = 0.81&& && {\\V(X)= 3.4246}&& {\\SD(X)=\sqrt{3.4246} = 1.85} \end{align} \]

Practice (end of 3.4 Chapter 3)

For a given probability distribution, compute the mean and standard deviation.


Check answer:

πœ‡=0.8
𝜎=0.6928203