If you’ve never coded before (or even if you have), type print("Your Name")
in the interactive R chunk below and run it by hitting crtl+Enter
or cmd+Enter
for MAC users.
In this workbook, we continue our exploration of statistical inference. We’ll cover the basic hypothesis testing framework in addition to discussing the confidence intervals you were exposed to in Workbook 10 more formally.
We’ll motivate this workbook by watching three videos from volunteers at OpenIntro.org. The first two will be from Dr. David Diez, a data scientist at YouTube and the last will be from Dr. Shannon McLintock who is a member of the statistics faculty at Cal Poly. After each of the videos, you’ll walk through a hands-on application of the video content to a new scenario.
Our first video discusses variability in point estimates. It is likely that some of the content will sound pretty familiar to you since we’ve were working with the idea of point estimates and variability in the last workbook. Watch the video below, and then we’ll engage with the ideas Dr. Diez discusses by walking through an example together.
An Example: A June 2020 Pew Research survey revealed that 74% of Americans support offering a path to citizenship for undocumented immigrants who were brought to the US illegally as children – often referred to as DREAMers.
We’ve discussed the impossibility of a true census, so the Pew study did not poll every single American to get their estimate. Instead, they surveyed 9,654 US adults between the dates of June 4 and June 10, 2020. You can find out more about the study logistics here. This means that the 74% from the article is the proportion of individuals from the study who were in favor of a path to citizenship for the DREAMers.
Answer the questions below to check your understanding of some of our terminology.
samp <- sample(c("Support Citizenship", "Do Not Support Citizenship"), size = 9654, prob = c(0.74, 0.26), replace = TRUE)
table(samp)
paste0("The proportion supporting the Citizenship option is: ", table(samp)[2]/9654)
paste0("This is a sampling error of: ", table(samp)[2]/9654 - 0.74)
By running the code block above multiple times, you’ve probably seen that most of the samples resulted in a sample proportion which was well-within one percentage point (0.01) of the assumed proportion \(p = 0.74\).
In the video Dr. Diez discusses how we can use the Central Limit Theorem to quantify how much variability we should see in the point estimate from one sample to the next. In the case of a single proportion, the Central Limit Theorem states the following:
Central Limit Theorem: When observations are independent and the sample size is sufficiently large, the sample proportion \(\hat{p}\) will tend to follow a normal distribution with \(\mu = p\) (the true population proportion) and standard error \(\displaystyle{S_E = \sqrt{\frac{p\left(1-p\right)}{n}}}\). That is \(\displaystyle{\hat{p} \sim N\left(\mu = p, ~S_E = \sqrt{\frac{p\left(1-p\right)}{n}}\right)}\).
Good work – notice that the standard error is about half of a percentage point (close to \(0.005\)). Doubling this estimate closely matches what we observed about the sampling error using our simulations. This brings us to our next topic – confidence intervals.
You’ll start again with a video from Dr. Diez. Once you’ve watched it, we’ll continue with our example about the 2020 Pew Research study on the proportion of American adults who are in favor of a citizenship option for the DREAMers.
As Dr. Diez mentions, a confidence interval can be used to capture a population parameter with some degree of certaintly. In general, we construct a confidence interval using the following formula: \[\displaystyle{\left(\tt{point~estimate}\right)\pm \left(\tt{critical~value}\right)\cdot S_E}\] where the
Recall that we’ve been working with a 2020 Pew Research study which included 9,654 participants. The study resulted in 74% of participants being in favor of a path to citizenship for the DREAMers, and we computed the standard error to be approximately \(0.0045\).
If we are sure that a sampling distribution is well-modeled by a normal distribution, we have the following critical values associated with several common levels of confidence.
Use what you learned in the video and your knowledge of the Pew Research study to answer the following questions. You can use the code block to make any necessary computations.
So far, so good! There’s one more topic to go. Sometimes we’ll want to test a claim about a population parameter rather than build a confidence interval for it. Inferential statistics provides a formal framework called the hypothesis test for evaluating statistical claims such as
Here’s one more video from Dr. Shannon McLintock (also of Openintro.org) introducing the notion of the hypothesis test.
A 2018 poll and story from NPR reported that 65% of Americans supported a path to citizenship for DREAMers. Does the new poll from Pew Research provide evidence that support to a pathway to citizenship for dreamers has grown over the past two years? Use an \(\alpha = 0.05\) level of significance.
As a recap, this workbook covered the following main points and ideas: