Topic 15: Hypothesis Tests and Confidence Intervals for Numerical Data

```{r setup, include=FALSE} #knitr::opts_chunk$set(eval = FALSE) library(dplyr) library(ggplot2) library(learnr) library(gradethis) #remotes::install_github("rstudio/gradethis") library(learnrhash) #devtools::install_github("rundel/learnrhash") tutorial_options(exercise.timelimit = 60, exercise.checker = gradethis::grade_learnr) set.seed(241) boundary1 <- round(runif(1, 0.75, 2), 2) df1 <- sample(5:25, size = 1) boundary2 <- round(runif(1, -2, -1.5), 2) df2 <- sample(5:25, size = 1) area3 <- round(runif(1, 0.05, 0.95), 2) df3 <- sample(5:25, size = 1) clevel4 <- sample(c(90, 95, 98, 99), size = 1) df4 <- sample(5:25, size = 1) SDNY <- read.csv("https://raw.githubusercontent.com/drs22Col/FedSentencing/master/Data/NYSouthernDistrictData.csv") SDNYdrug <- SDNY[grepl("[D,d]rug", SDNY$Offense) & SDNY$SentenceMonths >= 0, ] SDNYdrug$Race <- ifelse(SDNYdrug$Race == 1, "White", "Non-White") whiteSentences <- SDNYdrug$SentenceMonths[SDNYdrug$Race == "White"] nonWhiteSentences <- SDNYdrug$SentenceMonths[SDNYdrug$Race == "Non-White"] ``` If you've never coded before (or even if you have), type `print("Your Name")` in the interactive R chunk below and run it by hitting `crtl+Enter` or `cmd+Enter` for MAC users. ```{r Student-Name, exercise = TRUE} ``` ## Hypothesis Testing and Confidence Intervals for Numerical Data ### In this workbook, we continue our exploration of statistical inference. Through the past few workbooks you became more comfortable with hypothesis testing and confidence intervals associated with categorical variables; here we extend to numerical variables. First you'll be reminded of the normal distribution and be formally introduced to the family of $t$-distributions. After that, we'll work on a pair of applications to sentencing data from the Southern District of the State of New York. ### As a reminder, we have the [*standard errror decision tree*](www/StdErrorDecisionTree.pdf){target="_blank"} which you'll continue to make use of. In case you need a refresher on the tree, you can find my walkthrough video below: ### While you were watching my walkthrough video you probably noticed that the side of the standard error decision tree corresponding to numerical data (inference for the mean, $\mu$) is much more involved than the side corresponding to inference for proportions. I mentioned in that video a bit about why that side of the tree is more involved and much of it stems from the idea that using a sample standard deviation as an approximation for the population standard deviation adds uncertainty to our approach. In order to counter this added uncertainty, we utilize a class of *penalized* normal distributions, called the $t$-distributions. Watch at least one of the videos below for a bit of history on the $t$-distributions.

**A Detailed Introduction**

**A Shorter Introduction**

### What does the $t$-distribution look like? ### So we've identified some scenarios for which we should utilize a $t$-distribution instead of the normal ($z$) distribution -- the simple rule of thumb that I've given you is that **any time we use a sample standard deviation as a proxy for the population standard deviation in the standard error estimate, we should utilize the $t$-distribution**. ### There are several other rules of thumb which people follow -- such as, even if you utilize the sample standard deviation in place of the population standard deviation but your sample size is *large enough*, then we can safely use the normal distribution. Since access to powerful statistical software makes distribution lookup tables unnecessary, my feeling is that these rules of thumb are no longer required. That is, we should use the $t$-distibution any time we use the sample standard deviation in place of the population distribution. ### Okay, so what does the $t$-distribution actually look like? As we've mentioned before and was cited in the video introductions, the $t$-distribution is a family of distributions identified by a parameter called *degrees of freedom*. Below you can see a standard normal distribution in **black**, a $t$-distribution with 3 degrees of freedom in **red**, and a $t$-distribution with 12 degrees of freedom in **blue**. ```{r normal-and-two-ts, echo = FALSE, eval = TRUE, message = FALSE, warning = FALSE, results = FALSE} ggplot() + geom_line(mapping = aes(x = seq(-3.5, 3.5, length.out = 200), y = dnorm(seq(-3.5, 3.5, length.out = 200), 0, 1)), color = "black", lwd = 1.25) + geom_line(mapping = aes(x = seq(-3.5, 3.5, length.out = 200), y = dt(seq(-3.5, 3.5, length.out = 200), 3)), color = "red", lwd = 1.25) + geom_line(mapping = aes(x = seq(-3.5, 3.5, length.out = 200), y = dt(seq(-3.5, 3.5, length.out = 200), 12)), color = "blue", lwd = 1.25) + labs(title = "Normal and t-Distributions", x = "", y= "") ``` ### Notice that all three of the distributions are bell-shaped, but that the $t$-distributions have **fatter tails** than the normal distribution does. Also, notice that the $t$-distribution with 12 degrees of freedom is more similar to the normal distribution than the $t$-distribution with 3 degrees of freedom. As degrees of freedom increase, our $t$-distribution becomes closer and closer to the normal distribution. ### Using the $t$-distribution(s) ### When we introduced the normal distribution, we identified two helper functions: + The function `pnorm(q, mean, sd)` can be used to find the probability that a randomly selected observation is less than the boundary value $q$ from a population which is normally distributed with mean `mean` and standard deviation `sd`. + We've simply said that the `pnorm()` function can be used to find area to the left of some boundary value under a normal distribution. + The function `qnorm(p, mean, sd)` can be used to find the boundary value such that the probability of a randomly selected observation falling below that observation is `p`. That is, `qnorm(p, mean, sd)` finds the $p^{th}$ percentile in the normal distribution having the `mean` and `sd` indicated. + We've simply said that the `qnorm()` function can be used to find the cutoff value for which the area to the left, underneath a normal distribution, is `p`. ### We have analogous functions for the $t$-distribution. + The function `pt(q, df)` can be used to find the probability of falling to the left of the boundary value `q` in a $t$-distribution with `df` degrees of freedom. + The function `qt(p, df)` can be used to find the cutoff value for which the area to the left of that cutoff, in a $t$-distribution with `df` degrees of freedom, is `p`. ### Notice that our functions for the $t$-distribution do not have parameters for the mean or standard deviation. This means that we must always work with standardized variables (see the formula for the test statistic on the standard error decision tree) when working with the $t$-distributions. ### Let's start with some practice using the `pt()` and `qt()` functions. Don't forget to draw your pictures -- you are much more likely find incorrect answers if you omit this step. **Question 1:** Use the code block below to find $\mathbb{P}\left[t < \right.$ `r boundary1` $\left.\right]$ in a $t$-distribution with `r df1` degrees of freedom. ```{r pt-ex1-left, exercise = TRUE} ``` ```{r pt-ex1-left-check} grade_result( pass_if(~ (abs(.result - pt(boundary1, df1)) < 0.001)) ) ``` ### **Question 2:** Find $\mathbb{P}\left[t > \right.$ `r boundary2` $\left.\right]$ in a $t$-distribution with `r df2` degrees of freedom. ```{r pt-ex2-right, exercise = TRUE} ``` ```{r pt-ex2-right-check} grade_result( pass_if(~ (abs(.result - (1 - pt(boundary2, df2))) < 0.001)) ) ``` ### **Question 3:** Find the cutoff value in a $t$ distribution with `r df3` degrees of freedom for which the area to the left of the cutoff value is `r area3`. ```{r qt-ex3, exercise = TRUE} ``` ```{r qt-ex3-check} grade_result( pass_if(~ (abs(.result - qt(area3, df3)) < 0.01)) ) ``` ### **Question 4:** Find the critical value associated with a `r clevel4`% confidence interval using a $t$-distribution with `r df4` degrees of freedom. ```{r qt-ex4, exercise = TRUE} ``` ```{r qt-ex4-check} grade_result( pass_if(~ (abs(.result - qt((1 - 0.01*(100 - clevel4)/2), df4)) < 0.01)) ) ``` ### Okay, good -- now that you've had some practice working with the $t$-distribution, let's move on to some applications. ### Applications to Criminal Sentencing ### We'll work with a [dataset on Federal Sentencing from the Southern District of New York State](https://github.com/drs22Col/FedSentencing){target="_blank"}. A subset of the data, consisting only of drug-related charges, has been loaded for you as `SDNYdrug`. ### **Question 1:** Compute a 95% confidence interval for the average sentence length for a drug-related charge in the Southern District of New York State. ```{r sentencing-ci-1, echo = FALSE} question_radio( "To answer the question as asked, we should", answer("Compute a confidence interval", correct = TRUE), answer("Conduct a hypothesis test"), answer("Compute a probability"), answer("Find a required sample size"), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### Use the code block below to compute the point estimate for average sentence length (`SentenceMonths`). ```{r sentencing-ci-2, exercise = TRUE} ``` ```{r sentencing-ci-2-check} grade_result( pass_if(~ (abs(.result - mean(SDNYdrug$SentenceMonths)) < 0.01)) ) ```

The variable `SentenceMonths` is a column within the `SDNYdrug` data frame. Remember that you can access columns in a data frame with the `$` operator.

### ```{r sentencing-ci-3, echo = FALSE} question_radio( "The standard error formula is", answer("$\\displaystyle{S_E = \\sqrt{\\frac{p\\left(1-p\\right)}{n}}}$"), answer("$\\displaystyle{S_E = s/\\sqrt{n}}$", correct = TRUE), answer("$\\displaystyle{S_E = \\sigma/\\sqrt{n}}$", message = "Ooops -- do you really know the population standard deviation for sentence lengths?"), answer("$\\displaystyle{S_E = \\sqrt{\\frac{p_1\\left(1 - p_1\\right)}{n_1} + \\frac{p_2\\left(1 - p_2\\right)}{n_2}}}$"), answer("$\\displaystyle{S_E = \\sqrt{\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2}}}$"), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### Use the code block below to compute the standard error ```{r sentencing-ci-4, exercise = TRUE} ``` ```{r sentencing-ci-4-check} grade_result( pass_if(~ (abs(.result - (sd(SDNYdrug$SentenceMonths)/sqrt(nrow(SDNYdrug)))) < 0.0005)) ) ```

Remember that you can compute the standard deviation in R with the `sd()` function. Additionally, you can find the number of rows in a data frame by passing the name of the data frame to R's `nrow()` function. The `sqrt()` function will also be useful to you.

### ```{r sentencing-ci-5, echo = FALSE} question_radio( "The distribution to be used is", answer("Normal"), answer("t-distribution with $df = n - 1$", correct = TRUE), answer("t-distribution with $df = \\min\\{n_1 - 1, n_2 - 1\\}$"), answer("t-distribution with $df = n_{\\text{diff}} - 1$"), answer("We should not use either the normal- or t- distributions"), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### ```{r sentencing-ci-6, echo = FALSE} question_radio( "The desired level of confidence is", answer("90%"), answer("0.95%"), answer("95%", correct = TRUE), answer("98%"), answer("99%"), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### Use the code block below to enter or compute the critical value associated with your confidence interval. ```{r sentencing-ci-7, exercise = TRUE} ``` ```{r sentencing-ci-7-check} grade_result( pass_if(~ (abs(.result - qt(0.975, df = nrow(SDNYdrug) - 1)) < 0.005)) ) ```

Did you try using 1.96? Remember that we can't use the normal distribution here. You can find the critical value you need by making use of the `qt()` function.

### Did you remember to use the `qt()` function to determine the critical value? With so many observations, the correct critical value differed very little from the 1.96 value used with the normal distribution. Remember that the critical values provided on the *standard error decision tree* are for use with the normal distribution only. Any time we use a sample standard deviation as a "stand-in" for the population standard deviation while computing standard error, we should be using critical values from a $t$-distribution. We can get the critical value using R's `qt()` function -- this will make a real difference if sample sizes are smaller. ### Use the code block below to compute the lower bound for the 95% confidence interval. ```{r sentencing-ci-8, exercise = TRUE} ``` ```{r sentencing-ci-8-check} grade_result( pass_if(~ (abs(.result - (mean(SDNYdrug$SentenceMonths) - ((qt(0.975, nrow(SDNYdrug) - 1))*(sd(SDNYdrug$SentenceMonths)/sqrt(nrow(SDNYdrug)))))) < 0.01)) ) ``` ### And use the code block below to compute the upper bound for the 95% confidence interval. ```{r sentencing-ci-9, exercise = TRUE} ``` ```{r sentencing-ci-9-check} grade_result( pass_if(~ (abs(.result - (mean(SDNYdrug$SentenceMonths) + ((qt(0.975, nrow(SDNYdrug) - 1))*(sd(SDNYdrug$SentenceMonths)/sqrt(nrow(SDNYdrug)))))) < 0.01)) ) ``` ### ```{r try-it-1a-question-10, echo = FALSE} question_radio( "The interpretations of the values you identified are:", answer("We are 95% confident that the true average sentence lengths for drug-related offenders in the Southern District of New York State is between the lower bound and upper bounds calculated.", correct = TRUE), answer("95% of sentence lengths for drug-related charges in the Southern District of New York State will fall between the lower and upper bounds calculated."), answer("We are 95% confident that the average sentence lengths for drug-related crimes within the Southern District of New York State in this sample fall between the lower and upper bounds calculated"), answer("There is a 95% chance a criminal charged with a drug-related offense in the Southern District of New York State will receive a sentence between the lower and upper bounds calculated."), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### ```{r sentencing-ci-11, echo = FALSE} question_radio( "Does this sample provide evidence to suggest that the average sentence length for drug-related charges in the Southern District of New York State exceeds three years (36 months) in jail?", answer("No. The confidence interval shows that an average sentence below 36 months is plausible."), answer("Yes. The confidence interval for the average sentence length includes only values exceeding 36 months.", correct = TRUE), answer("No. Some sentence lengths are below 36 months while others exceed 36 months."), answer("It is impossible to say, since a different sample would result in a different confidence interval."), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### **Question 2:** Conduct a hypothesis test at the $\alpha = 0.10$ level of significance to determine whether the sample data provides significant evidence to suggest that the average sentence length for white offenders and average sentence length for non-white offenders differs for drug-related cases in the Southern District of New York State. ### ```{r sentencing-ht-1, echo = FALSE} question_radio( "To answer the question as asked, we should", answer("Compute a confidence interval"), answer("Conduct a hypothesis test", correct = TRUE), answer("Compute a probability"), answer("Find a required sample size"), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### ```{r sentencing-ht-2, echo = FALSE} question_radio( "What is the level of significance associated with this test?", answer("$\\alpha = 0.01$"), answer("$\\alpha = 0.10$", correct = TRUE), answer("$\\alpha = 0.05$"), answer("The $p$-value"), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### ```{r sentencing-ht-3, echo = FALSE} question_radio( "Does this hypothesis test involve testing a statement about the mean ($\\mu$), a proportion ($p$), or something else?", answer("One or more means", correct = TRUE), answer("One or more proportions"), answer("Something else altogether"), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### ```{r sentencing-ht-4, echo = FALSE} question_radio( "How many groups are being compared in this test?", answer("The test involves only a single group"), answer("The test compares two groups", correct = TRUE), answer("The test compares more than two groups"), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### ```{r sentencing-ht-5, echo = FALSE} question_radio( "Which of the following are the hypotheses associated with this test?", answer("$\\begin{array}{ll} H_0: & \\mu_{\\text{white}} - \\mu_{\\text{non-white}} = 0\\\\ H_a: & \\mu_{\\text{white}} - \\mu_{\\text{non-white}} \\neq 0\\end{array}$", correct = TRUE), answer("$\\begin{array}{ll} H_0: & \\mu_{\\text{white}} - \\mu_{\\text{non-white}} = 0\\\\ H_a: & \\mu_{\\text{white}} - \\mu_{\\text{non-white}} > 0\\end{array}$"), answer("$\\begin{array}{ll} H_0: & \\mu_{\\text{white}} - \\mu_{\\text{non-white}} = 0\\\\ H_a: & \\mu_{\\text{white}} - \\mu_{\\text{non-white}} < 0\\end{array}$"), answer("$\\begin{array}{ll} H_0: & \\mu_{\\text{white}} \\neq \\mu_{\\text{non-white}}\\\\ H_a: & \\mu_{\\text{white}} = \\mu_{\\text{non-white}}\\end{array}$"), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### ```{r sentencing-ht-6, echo = FALSE} question_radio( "Do we know the population standard deviation(s), $\\sigma$, for sentence length in each group?", answer("Yes, we can compute the population standard deviation of sentence lenghts from the data with the `sd()` function"), answer("Yes, we are provided the population standard deviation of sentence lengths."), answer("No, we can compute the standard deviation in sentence lengths from the data, but this will give us sample standard deviations rather than the population standard deviations.", correct = TRUE), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### ```{r sentencing-ht-7, echo = FALSE} question_radio( "Are the observations in the two groups (sentences handed to white offenders and sentences handed to non-white offenders) paired?", answer("No. There is no reason to suggest that sentences are paired.", correct = TRUE), answer("Yes. Every sentence handed to a white offender can be naturally paired to a sentence handed to a non-white offender."), allow_retry = TRUE, random_answer_order = TRUE ) ``` ### ```{r sentencing-ht-8, echo = FALSE} quiz( question_radio( "Which standard error formula should be used in the computation of the test statistic?", answer("$\\displaystyle{S_E = \\sqrt{\\frac{s_1^2}{n_1} + \\frac{s_2^2}{n_2}}}$", correct = TRUE), answer("$\\displaystyle{S_E = s/\\sqrt{n}}$"), answer("$\\displaystyle{S_E = \\sqrt{\\frac{\\sigma_1^2}{n_1} + \\frac{\\sigma_2^2}{n_2}}}$"), answer("$\\displaystyle{S_E = \\sqrt{\\frac{p\\left(1 - p\\right)}{n}}}$"), allow_retry = TRUE, random_answer_order = TRUE ), question_radio( "Which distribution does the test statistic follow for this hypothesis test?", answer("The normal distribution"), answer("The $t$-distribution with $n-1$ degrees of freedom."), answer("The $t$-distribution with $\\min\\{n_1 - 1, n_2 - 1\\}$ degrees of freedom.", correct = TRUE), answer("The $t$-distribution with $n_{\\text{diff}} - 1$ degrees of freedom."), allow_retry = TRUE, random_answer_order = TRUE ) ) ``` ### I've stored the sentence lengths (in months) handed down to white offenders in an object called `whiteSentences` and the sentence lengths for non-white offenders in an object called `nonWhiteSentences`. Use the code blocks below to answer the corresponding questions. Find the number of white offenders sentenced in our dataset. ```{r sentencing-ht-9, exercise = TRUE} ``` ```{r sentencing-ht-9-check} grade_result( pass_if(~ (.result == length(whiteSentences))) ) ```

Since `whiteSentences` and `nonWhiteSentences` are vectors of sentence lengths rather than full data frames, the `nrow()` function won't work on them. Try using the `length()` function instead.

### Find the number of non-white offenders sentenced in our dataset. ```{r sentencing-ht-10, exercise = TRUE} ``` ```{r sentencing-ht-10-check} grade_result( pass_if(~ (.result == length(nonWhiteSentences))) ) ``` Compute the average sentence length in months for the white offenders. ```{r sentencing-ht-11, exercise = TRUE} ``` ```{r sentencing-ht-11-check} grade_result( pass_if(~ (abs(.result - mean(whiteSentences)) < 0.01)) ) ``` Compute the average sentence length in months for the non-white offenders. ```{r sentencing-ht-12, exercise = TRUE} ``` ```{r sentencing-ht-12-check} grade_result( pass_if(~ (abs(.result - mean(nonWhiteSentences)) < 0.01)) ) ``` ### Compute the standard deviation in sentence lengths for the white offenders. ```{r sentencing-ht-13, exercise = TRUE} ``` ```{r sentencing-ht-13-check} grade_result( pass_if(~ (abs(.result - sd(whiteSentences)) < 0.01)) ) ``` Compute the standard deviation in sentence lengths for the non-white offenders. ```{r sentencing-ht-14, exercise = TRUE} ``` ```{r sentencing-ht-14-check} grade_result( pass_if(~ (abs(.result - sd(nonWhiteSentences)) < 0.01)) ) ``` ### Okay, we've answered lots of questions that give us pieces of our analysis. Now, let's think about putting to pieces together to compute the test statistic, $p$-value, and complete the hypothesis test. ```{r sentencing-ht-15, echo = FALSE} quiz( question_radio( "The population parameter in question for this hypothesis test is", answer("$\\mu_{\\text{white}} - \\mu_{\\text{non-white}}$", correct = TRUE), answer("$\\mu_{\\text{white}}$"), answer("$\\mu_{\\text{non-white}}$"), answer("$\\bar{x}_{\\text{white}} - \\bar{x}_{\\text{non-white}}$", message = "Oops. This is the point estimate!"), allow_retry = TRUE, random_answer_order = TRUE ), question_radio( "The *null value* is", answer("0", correct = TRUE), answer("0.5"), answer("14"), answer("44.79"), answer("39.25"), allow_retry = TRUE, random_answer_order = TRUE ) ) ``` ### Compute the point estimate ```{r sentencing-ht-16, exercise = TRUE} ``` ```{r sentencing-ht-16-check} grade_result( pass_if(~ (abs(abs(.result) - abs(mean(whiteSentences) - mean(nonWhiteSentences))) < 0.01)) ) ``` ### Compute the standard error ```{r sentencing-ht-17, exercise = TRUE} ``` ```{r sentencing-ht-17-check} grade_result( pass_if(~ (abs(.result - (sqrt((sd(whiteSentences)^2/length(whiteSentences)) + (sd(nonWhiteSentences)^2/length(nonWhiteSentences))))) < 0.01)) ) ``` ### Compute the test statistic ```{r sentencing-ht-18, exercise = TRUE} ``` ```{r sentencing-ht-18-check} grade_result( pass_if(~ (abs(.result - ((mean(whiteSentences) - mean(nonWhiteSentences)) - 0)/((sqrt((sd(whiteSentences)^2/length(whiteSentences)) + (sd(nonWhiteSentences)^2/length(nonWhiteSentences)))))) < 0.01)) ) ``` ### Compute the degrees of freedom for this test. ```{r sentencing-ht-18_5, exercise = TRUE} ``` ```{r sentencing-ht-18_5-check} grade_result( pass_if(~ (.result - (min(length(whiteSentences), length(nonWhiteSentences)) - 1)) == 0) ) ``` ### Compute the p-value ```{r sentencing-ht-19, exercise = TRUE} ``` ```{r sentencing-ht-19-check} grade_result( pass_if(~ (abs(.result - (2*(1 - pt(abs(((mean(whiteSentences) - mean(nonWhiteSentences)) - 0)/((sqrt((sd(whiteSentences)^2/length(whiteSentences)) + (sd(nonWhiteSentences)^2/length(nonWhiteSentences)))))), df = min(length(whiteSentences) - 1, length(nonWhiteSentences) - 1))))) < 0.01)) ) ``` ### Answer the following to complete the hypothesis test. ```{r sentencing-ht-20, echo = FALSE} quiz( question_radio( "What is the result of the test?", answer("Since $p \\geq \\alpha$, we do not have enough evidence to reject the null hypothesis."), answer("Since $p \\geq \\alpha$, we accept the null hypothesis."), answer("Since $p < \\alpha$, we reject the null hypothesis and accept the alternative hypothesis.", correct = TRUE), answer("Since $p < \\alpha$, we fail to reject the null hypothesis."), answer("It is impossible to determine."), allow_retry = TRUE, random_answer_order = TRUE ), question_radio( "The result of the test means that", answer("The sample data did not provide significant evidence to suggest that there is difference in average sentence length handed to white versus non-white defendants for drug-related cases in the Southern District of New York State."), answer("The sample data proved that there is no difference in the average length of sentences handed down to white versus non-white defendants for drug-related cases in the Southern District of New York State."), answer("The sample data provided significant evidence to suggest a difference in the average length of sentences handed down to white versus non-white defendants for drug-related cases in the Southern District of New York State.", correct = TRUE), answer("Sample data cannot be used to test a claim about a population -- different samples yeild different results and so any claim about a population is nonsensicle unless a census was conducted."), allow_retry = TRUE, random_answer_order = TRUE ), question_radio( "Which of the following is an implication of our result", answer("We have proved that racial bias exists in sentencing for drug-related cases in the Southern District of New York State"), answer("We have found evidence to support the hypothesis that racial bias exists in sentencing for drug-related cases in the Southern District of New York State. Our result suggests that a more formal audit of sentencing recommendations should be undertaken.", correct = TRUE), answer("We have found evidence to support the hypothesis that racial bias exists in the courts across all case types."), answer("We have found evidence to suggest that racial minorities receive more lengthy sentences in drug-related cases in the Southern District of New York State.") ) ) ``` ### Submit ```{r context="server"} learnrhash::encoder_logic(strip_output = TRUE) ``` ```{r encode, echo=FALSE} learnrhash::encoder_ui( ui_before = shiny::div( "If you have completed this tutorial and are happy with all of your", "solutions, please click the button below to generate your hash and", "submit it using the corresponding tutorial assignment tab on Blackboard", shiny::tags$br() ), ui_after = shiny::tags$a(href = "https://blackboard.ncat.edu/", "NCAT Blackboard") ) ``` ### Summary Wow -- good work through all of that! I hope you found the application to sentence lengths in the SDNY to be interesting and eye-opening. The approaches to these problems were really drawn out, step-by-step, to help you identify the necessary steps for computing a confidence interval or conducting a hypothesis test with numerical data. Think about each individual step you were asked to complete and how that step relates to the larger process of completing the inference task. Doing this will help you become a better independent problem solver. Your next workbook will provide you an opportunity to practice with a mixture of inference tasks dealing with both numerical and categorical data. You'll become more comfortable with the content and applying our statistical techniques as you work through more and more problems.

Introduction to Probability & Statistics