Probability

```{r global_options, include=FALSE} knitr::opts_chunk$set(eval = TRUE, results = FALSE, fig.show = "hide", message = FALSE) library(tidyverse) library(openintro) ``` ## Getting Started ### Load packages In this lab, we will explore and visualize the data using the `tidyverse` suite of packages. The data can be found in the companion package for OpenIntro labs, **openintro**. Let's load the packages. ```{r load-packages, message=FALSE} library(tidyverse) library(openintro) ``` ## Simulations in R In a simulation, you set the ground rules of a random process and then the computer uses random numbers to generate an outcome that adheres to those rules. As a simple example, you can simulate flipping a fair coin with the following. ```{r head-tail} coin_outcomes <- c("heads", "tails") sample(coin_outcomes, size = 1, replace = TRUE) ``` The vector `coin_outcomes` can be thought of as a hat with two slips of paper in it: one slip says `heads` and the other says `tails`. The function `sample` draws one slip from the hat and tells us if it was a head or a tail. Run the second command listed above several times. Just like when flipping a coin, sometimes you'll get a heads, sometimes you'll get a tails, but in the long run, you'd expect to get roughly equal numbers of each. If you wanted to simulate flipping a fair coin 100 times, you could either run the function 100 times or, more simply, adjust the `size` argument, which governs how many samples to draw (the `replace = TRUE` argument indicates we put the slip of paper back in the hat before drawing again). Save the resulting vector of heads and tails in a new object called `sim_fair_coin`. ```{r sim-fair-coin} sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE) ``` To view the results of this simulation, type the name of the object and then use `table` to count up the number of heads and tails. ```{r table-sim-fair-coin} sim_fair_coin table(sim_fair_coin) ``` Since there are only two elements in `coin_outcomes`, the probability that we "flip" a coin and it lands heads is 0.5. Say we're trying to simulate an unfair coin that we know only lands heads 20% of the time. We can adjust for this by adding an argument called `prob`, which provides a vector of two probability weights. ```{r sim-unfair-coin} sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE, prob = c(0.2, 0.8)) ``` `prob=c(0.2, 0.8)` indicates that for the two elements in the `outcomes` vector, we want to select the first one, `heads`, with probability 0.2 and the second one, `tails` with probability 0.8. Another way of thinking about this is to think of the outcome space as a bag of 10 chips, where 2 chips are labeled "head" and 8 chips "tail". Therefore at each draw, the probability of drawing a chip that says "head"" is 20%, and "tail" is 80%. 1. In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response. Since the markdown file will run the code, and generate a new sample each time you *Knit* it, you should also "set a seed" **before** you sample. Read more about setting a seed below. ::: {#boxedtext} **A note on setting a seed:** Setting a seed will cause R to select the same sample each time you knit your document. This will make sure your results don't change each time you knit, and it will also ensure reproducibility of your work (by setting the same seed it will be possible to reproduce your results). You can set a seed like this: ```{r set-seed} set.seed(35797) # make sure to change the seed ``` The number above is completely arbitraty. If you need inspiration, you can use your ID, birthday, or just a random string of numbers. The important thing is that you use each seed only once in a document. Remember to do this **before** you sample in the exercise above. ::: In a sense, we've shrunken the size of the slip of paper that says "heads", making it less likely to be drawn, and we've increased the size of the slip of paper saying "tails", making it more likely to be drawn. When you simulated the fair coin, both slips of paper were the same size. This happens by default if you don't provide a `prob` argument; all elements in the `outcomes` vector have an equal probability of being drawn. If you want to learn more about `sample` or any other function, recall that you can always check out its help file. ```{r help-sample,tidy = FALSE} ?sample ``` ## Binomial Experiments and the Binomial Distribution ### Binomial Experiments **Binomial Experiments:** A binomial experiment satisfies each of the following three criteria: + There are $n$ repeated trials. + Each trial has two possible outcomes (usually called *success* and *failure* for convenience) + The trials are independent of one another. That is, for each trial, the probability of success is $p$ (which remains constant). ### Binomial Distribution **Binomial Distribution:** Let $X$ be the number of successes resulting from a binomial experiment with $n$ trials. We can compute the following probabilities: + The probability of exactly $k$ successes is given by $\displaystyle{\mathbb{P}\left[X = k\right] = \binom{n}{k}\cdot p^k\left(1 - p\right)^{n-k} \approx \tt{dbinom(k, n, p)}}$ + The probability of at most $k$ successes is given by $\displaystyle{\mathbb{P}\left[X \leq k\right] = \sum_{i=0}^{k}{\binom{n}{i}\cdot p^i\left(1 - p\right)^{n-i}} \approx \tt{pbinom(k, n, p)}}$ In the equations above, $\binom{n}{k} = \frac{n!}{k!\left(n-k\right)!}$ counts the number of ways to arrange the $k$ successes amongst the $n$ trials. That being said, the `R` functionality, `dbinom()` and `pbinom()` allow us to bypass the messy formulas -- but you'll still need to know what these functions do in order to use them correctly! **Tip:** We need to use the binomial distribution to find probabilities associated with numbers of successful (or failing) outcomes in which *we do not know for certain the trials on which the successes (or failures) occur*. For example, we would like to find the probability that we got exactly 30 heads from tossing a coin 50 times. So we can calculate it using `dbinom()`. ```{r 50-30-head-toss} dbinom(30,50,0.5) ``` We obtain $P(X=30) = 0.04185915$. So according to our answer there is a 4.19% chance of getting exactly 30 heads from 50 coin tosses. Now if we want to find the probability that we got at most 8 tails out of 20 tosses then we can calculate it using `pbinom()`. ```{r less-8-20-toss} pbinom(8,20,0.5) ``` We obtain $P(X \le 8) = 0.2517223$. Based on the probability, we have a 25.17% chance of getting at most 8 tails out of 20 coin tosses. A students takes an exam with 35 multiple choice questions. Each question has 4 choices. Based on this information answer the following questions: 1. Find the probability that the student got at most 25 questions right. 2. Find the probability that the student got at least 10 questions right. 3. Find the probability that the student got a maximum of 31 and a minimum of 19 questions right. ------------------------------------------------------------------------ ![Creative Commons License](https://i.creativecommons.org/l/by-sa/4.0/88x31.png){style="border-width:0"}
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Introduction to Probability & Statistics