Getting Started

Load packages

In this lab, we will explore and visualize the data using the tidyverse suite of packages. The data can be found in the companion package for OpenIntro labs, openintro.

Let’s load the packages.

library(tidyverse)
library(openintro)

Simulations in R

In a simulation, you set the ground rules of a random process and then the computer uses random numbers to generate an outcome that adheres to those rules. As a simple example, you can simulate flipping a fair coin with the following.

coin_outcomes <- c("heads", "tails")
sample(coin_outcomes, size = 1, replace = TRUE)

The vector coin_outcomes can be thought of as a hat with two slips of paper in it: one slip says heads and the other says tails. The function sample draws one slip from the hat and tells us if it was a head or a tail.

Run the second command listed above several times. Just like when flipping a coin, sometimes you’ll get a heads, sometimes you’ll get a tails, but in the long run, you’d expect to get roughly equal numbers of each.

If you wanted to simulate flipping a fair coin 100 times, you could either run the function 100 times or, more simply, adjust the size argument, which governs how many samples to draw (the replace = TRUE argument indicates we put the slip of paper back in the hat before drawing again). Save the resulting vector of heads and tails in a new object called sim_fair_coin.

sim_fair_coin <- sample(coin_outcomes, size = 100, replace = TRUE)

To view the results of this simulation, type the name of the object and then use table to count up the number of heads and tails.

sim_fair_coin
table(sim_fair_coin)

Since there are only two elements in coin_outcomes, the probability that we “flip” a coin and it lands heads is 0.5. Say we’re trying to simulate an unfair coin that we know only lands heads 20% of the time. We can adjust for this by adding an argument called prob, which provides a vector of two probability weights.

sim_unfair_coin <- sample(coin_outcomes, size = 100, replace = TRUE, 
                          prob = c(0.2, 0.8))

prob=c(0.2, 0.8) indicates that for the two elements in the outcomes vector, we want to select the first one, heads, with probability 0.2 and the second one, tails with probability 0.8. Another way of thinking about this is to think of the outcome space as a bag of 10 chips, where 2 chips are labeled “head” and 8 chips “tail”. Therefore at each draw, the probability of drawing a chip that says “head”" is 20%, and “tail” is 80%.

  1. In your simulation of flipping the unfair coin 100 times, how many flips came up heads? Include the code for sampling the unfair coin in your response. Since the markdown file will run the code, and generate a new sample each time you Knit it, you should also “set a seed” before you sample. Read more about setting a seed below.

A note on setting a seed: Setting a seed will cause R to select the same sample each time you knit your document. This will make sure your results don’t change each time you knit, and it will also ensure reproducibility of your work (by setting the same seed it will be possible to reproduce your results). You can set a seed like this:

set.seed(35797)                 # make sure to change the seed

The number above is completely arbitraty. If you need inspiration, you can use your ID, birthday, or just a random string of numbers. The important thing is that you use each seed only once in a document. Remember to do this before you sample in the exercise above.

In a sense, we’ve shrunken the size of the slip of paper that says “heads”, making it less likely to be drawn, and we’ve increased the size of the slip of paper saying “tails”, making it more likely to be drawn. When you simulated the fair coin, both slips of paper were the same size. This happens by default if you don’t provide a prob argument; all elements in the outcomes vector have an equal probability of being drawn.

If you want to learn more about sample or any other function, recall that you can always check out its help file.

?sample

Binomial Experiments and the Binomial Distribution

Binomial Experiments

Binomial Experiments: A binomial experiment satisfies each of the following three criteria:

  • There are \(n\) repeated trials.
  • Each trial has two possible outcomes (usually called success and failure for convenience)
  • The trials are independent of one another. That is, for each trial, the probability of success is \(p\) (which remains constant).

Binomial Distribution

Binomial Distribution: Let \(X\) be the number of successes resulting from a binomial experiment with \(n\) trials. We can compute the following probabilities:

  • The probability of exactly \(k\) successes is given by
    \(\displaystyle{\mathbb{P}\left[X = k\right] = \binom{n}{k}\cdot p^k\left(1 - p\right)^{n-k} \approx \tt{dbinom(k, n, p)}}\)
  • The probability of at most \(k\) successes is given by
    \(\displaystyle{\mathbb{P}\left[X \leq k\right] = \sum_{i=0}^{k}{\binom{n}{i}\cdot p^i\left(1 - p\right)^{n-i}} \approx \tt{pbinom(k, n, p)}}\)

In the equations above, \(\binom{n}{k} = \frac{n!}{k!\left(n-k\right)!}\) counts the number of ways to arrange the \(k\) successes amongst the \(n\) trials. That being said, the R functionality, dbinom() and pbinom() allow us to bypass the messy formulas – but you’ll still need to know what these functions do in order to use them correctly!

Tip: We need to use the binomial distribution to find probabilities associated with numbers of successful (or failing) outcomes in which we do not know for certain the trials on which the successes (or failures) occur.

For example, we would like to find the probability that we got exactly 30 heads from tossing a coin 50 times. So we can calculate it using dbinom().

dbinom(30,50,0.5)

We obtain \(P(X=30) = 0.04185915\). So according to our answer there is a 4.19% chance of getting exactly 30 heads from 50 coin tosses.

Now if we want to find the probability that we got at most 8 tails out of 20 tosses then we can calculate it using pbinom().

pbinom(8,20,0.5)

We obtain \(P(X \le 8) = 0.2517223\). Based on the probability, we have a 25.17% chance of getting at most 8 tails out of 20 coin tosses.

A students takes an exam with 35 multiple choice questions. Each question has 4 choices. Based on this information answer the following questions:

  1. Find the probability that the student got at most 25 questions right.

  2. Find the probability that the student got at least 10 questions right.

  3. Find the probability that the student got a maximum of 31 and a minimum of 19 questions right.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.