Inference for categorical data Solution
```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(tidyverse) library(openintro) library(infer) ``` ## Exercise 1 (2 Points) ```{r, warning=F, message=F} yrbss %>% count(text_while_driving_30d) ``` ## Exercise 2 (3 Points) ### For Section 001 ```{r, warning=F, message=F} yrbss %>% select(helmet_12m, text_while_driving_30d)%>% na.omit()%>% filter(text_while_driving_30d != "did not drive", helmet_12m == "never")%>% count(text_while_driving_30d)%>% mutate(prop = n/sum(n)) ``` ### For Section 007 ```{r, warning=F, message=F} no_helmet <- yrbss %>% filter(helmet_12m == "never") no_helmet <- no_helmet %>% mutate(text_ind = ifelse(text_while_driving_30d == "30", "yes", "no"))%>% filter(!is.na(text_ind)) no_helmet%>% count(text_ind)%>% mutate(p = n/sum(n)) ``` ## Exercise 3 (3 Points) ```{r, warning=F, message=F} no_helmet <- yrbss %>% filter(helmet_12m == "never") no_helmet <- no_helmet %>% mutate(text_ind = ifelse(text_while_driving_30d == "30", "yes", "no"))%>% filter(!is.na(text_ind)) prop_test(no_helmet, text_ind ~ NULL, success = "yes", z = TRUE, conf_int = TRUE, conf_level = 0.95, correct = FALSE) # 2 Point (0.07770443 - 0.06519769)/2 # 1 Point ``` ## Exercise 4 (3 Points) ```{r, warning=F, message=F} dat = yrbss%>% select(gender)%>% na.omit() # 1 Point prop_test(dat, gender ~ NULL, success = "male", z = TRUE, conf_int = TRUE, conf_level = 0.95, correct = FALSE) # 1 Point (0.5205266 - 0.5037094)/2 # 1 Point ``` ## Exercise 5 (3 Points) **2 Point** When p = 0 & 1, ME = 0, but as p increases ME increases. ME maximizes when p = 0.5, after that it starts decreasing as p increases. ```{r, warning=F, message=F} n <- 1000 #0.25 p <- seq(from = 0, to = 1, by = 0.01) # 0.25 Point me <- 2 * sqrt(p * (1 - p)/n) # 0.25 Point dd <- data.frame(p = p, me = me) ggplot(data = dd, aes(x = p, y = me)) + geom_line() + labs(x = "Population Proportion", y = "Margin of Error") # 0.25 Point ``` ## Exercise 6 (3 Points) **2 Points** We would reject the Null Hypothesis $H_0$ because p-values is less than 0.05 ```{r, warning = FALSE, message = FALSE} prop_test(no_helmet, text_ind~NULL, success = "yes", p = 0.08, z = TRUE) # 1 Point ``` ## Exercise 7 (3 Points) **1 Point for question** Hypothesis testing on the gender variable to see if the male proportion is 50% or not ```{r, warning = FALSE, message = FALSE} prop_test(dat, gender~NULL, success = "male", p = 0.50, z = TRUE) # 1 Point ``` **1 Point for inference** Since the p-value is less than 0.05, we will reject the null hypothesis of $H_0: p = 0.5$