```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ```{r, echo=F, message=F, warning=F} library(readr) library(openintro) library(here) library(tidyverse) library(xtable) data(COL) ``` # Difference of two means ## Experiment - A 2003 American Journal of Health Education Study investigated the effects of cell phone use on reaction time while driving. - In the study, 60 participants were randomly selected and placed into one of two groups: - Treatment Group - Access to text documents on a cell phone. - Control Group - No distractions - Participants in each group were then asked to take a computerized reaction time test. - Researchers then recorded each subject's reaction time in seconds. ## Data Summary \begin{center} \begin{tabular}{l | c | c} & {\footnotesize \textcolor{blue}{Treatment}} & {\footnotesize \textcolor{blue}{Control}} \\ & Phone & No Phone \\ \hline $\bar{x}$ & 0.546 & 0.356 \\ $s$ & 0.213 & 0.245 \\ $n$ & 30 & 30 \end{tabular} \end{center} ## Parameter and point estimate - **Parameter of interest:** Average difference between the reaction time of **all** drivers using a phone or not. \centering{$\mu_{C} - \mu_{T}$} \pause - **Point estimate:** Average difference between the reaction time of **participants** in the treatment and control group. \centering{$\bar{x}_{C} - \bar{x}_{T}$} ## Hypotheses \alert{Which of the following is the correct set of hypotheses for testing if the average reaction time of the drivers using a phone ($\mu_T$) is higher than the average reaction time of the drivers not using a phone ($\mu_C$)?} \begin{enumerate}[A)] \item $H_0: \mu_{C} = \mu_{T}$ \\ $H_A: \mu_{C} \ne \mu_{pT}$ \item $H_0: \mu_{C} = \mu_{T}$ \\ $H_A: \mu_{C} > \mu_{T}$ \item $H_0: \mu_{C} = \mu_{T}$ \\ $H_A: \mu_{C} < \mu_{T}$ \item $H_0: \bar{x}_{C} = \bar{x}_{T}$ \\ $H_A: \bar{x}_{C} < \bar{x}_{T}$ \end{enumerate} ## Hypotheses \alert{Which of the following is the correct set of hypotheses for testing if the average reaction time of the drivers using a phone ($\mu_T$) is higher than the average reaction time of the drivers not using a phone ($\mu_C$)?} \begin{enumerate}[A)] \item $H_0: \mu_{C} = \mu_{T}$ \\ $H_A: \mu_{C} \ne \mu_{pT}$ \item $H_0: \mu_{C} = \mu_{T}$ \\ $H_A: \mu_{C} > \mu_{T}$ \item \textcolor{red}{$H_0: \mu_{C} = \mu_{T}$} \\ \textcolor{red}{$H_A: \mu_{C} < \mu_{T}$} \item $H_0: \bar{x}_{C} = \bar{x}_{T}$ \\ $H_A: \bar{x}_{C} < \bar{x}_{T}$ \end{enumerate} ## Conditions \alert{Which of the following does \underline{not} need to be satisfied in order to conduct this hypothesis test using theoretical methods?} A) The reaction time of drivers not using a phone in the sample should be independent of another, and the reaction time of drivers using a phone should independent of another as well. B) The reaction times of drivers using and not using a phone in the sample should be independent. C) Distributions of reaction times of drivers in both groups should not be extremely skewed. D) Both sample sizes should be the same. ## Conditions \alert{Which of the following does \underline{not} need to be satisfied in order to conduct this hypothesis test using theoretical methods?} A) The reaction time of drivers not using a phone in the sample should be independent of another, and the reaction time of drivers using a phone should independent of another as well. B) The reaction times of drivers using and not using a phone in the sample should be independent. C) Distributions of reaction times of drivers in both groups should not be extremely skewed. D) \alert{Both sample sizes should be the same.} ## Test statistics The test statistic for inference on the difference of two means where $\sigma_1$ and $\sigma_2$ are unknown is the T statistic. \centering{$T_{df} = \frac{\text{point estimate} - \text{null value}}{SE}$} \raggedright where \centering{$SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \hspace{1cm} \text{and} \hspace{1cm} df = min(n_1 - 1, n_2 - 1)$} \raggedright\noindent\rule{4cm}{0.4pt} \alert{Note:} The calculation of the $df$ is actually much more complicated. For simplicity we'll use the above formula to \underline{estimate} the true $df$ when conducting the analysis by hand. ## Test statistic \begin{center} \begin{tabular}{l | c | c} & {\footnotesize \textcolor{blue}{Treatment}} & {\footnotesize \textcolor{blue}{Control}} \\ & Phone & No Phone \\ \hline $\bar{x}$ & 0.546 & 0.356 \\ $s$ & 0.213 & 0.245 \\ $n$ & 30 & 30 \end{tabular} \end{center} \alert{in context...} \pause \begin{eqnarray*} T &=& \frac{\text{point estimate} - \text{null value} }{SE} \\ \pause &=& \frac{(0.356-0.546) - 0}{\sqrt{\frac{0.213^2}{30} + \frac{0.245^2}{30} }} \\ \pause &=& \frac{-0.19}{0.0593} \\ \pause &=& -3.20 \end{eqnarray*} ## Test statistic - Coincidentally, in the experiment the number of participants in both groups is the same. - Thus, $df = min(30-1, 30-1) = min(29, 29) = 29.$ ## p-value \alert{Which of the following is the correct p-value for this hypothesis test?} \centering{$T = -3.20 \hspace{1cm} df = 29$} A) Between 0.0005 and 0.001 B) Between 0.001 and 0.0025 C) Between 0.002 and 0.005 D) Between 0.01 and 0.02 ## p-value \alert{Which of the following is the correct p-value for this hypothesis test?} \centering{$T = -3.20 \hspace{1cm} df = 29$} A) Between 0.0005 and 0.001 B) \alert{Between 0.001 and 0.0025} C) Between 0.002 and 0.005 D) Between 0.01 and 0.02 ## Synthesis \alert{What is the conclusion of the hypothesis test? How (if at all) would this conclusion change your behavior if you went diamond shopping?} \pause - p-value is small so reject $H_0$. The data provide convincing evidence to suggest that the average reaction time of drivers not using a phone is faster than the drivers using a phone while driving. - Try not to use your phone while driving because you may never know when you would require the fast reaction time to avoid an accident. ## Equivalent confidence level \alert{What is the equivalent confidence level for a one-sided hypothesis test at $\alpha = 0.05$?} $\vspace{1cm}$ \begin{columns} \begin{column}{0.3\textwidth} A) 90% B) 92.5% C) 95% D) 97.5% \end{column} \begin{column}{0.7\textwidth} \end{column} \end{columns} ## Equivalent confidence level \alert{What is the equivalent confidence level for a one-sided hypothesis test at $\alpha = 0.05$?} \begin{columns} \begin{column}{0.3\textwidth} A) \alert{90\%} B) 92.5% C) 95% D) 97.5% \end{column} \begin{column}{0.7\textwidth} ```{r, echo=F, message=F, warning=F,out.width="100%",fig.align='center'} plot(c(-4, 4), c(0, dnorm(0)), type='n', axes=FALSE, ylab = "", xlab = "") axis(1, at = c(-5,-1.72, 0, 1.72, 5), labels = c(NA, NA, 0, NA, NA)) abline(h=0) X <- seq(-8, 8, 0.01) Y <- dt(X, 9) lines(X, Y) these <- which(X < -1.72) yy <- c(0, Y[these], 0) these <- c(these[1], these, rev(these)[1]) xx <- X[these] polygon(xx, yy, col=COL[1]) these <- which(X > 1.72) yy <- c(0, Y[these], 0) these <- c(these[1], these, rev(these)[1]) xx <- X[these] polygon(xx, yy, col=COL[1]) text(0,0.6*max(Y), "90%", col = "red", cex = 3) text(2,0.05*max(Y), "5%", col = "red", cex = 2) text(-2,0.05*max(Y), "5%", col = "red", cex = 2) ``` \end{column} \end{columns} ## Critical value \alert{What is the appropriate $t^*$ for a confidence interval for the average difference between using a phone and not using it while driving?} A) 1.32 B) 1.70 C) 2.07 D) 2.82 ## Critical value \alert{What is the appropriate $t^*$ for a confidence interval for the average difference between using a phone and not using it while driving?} A) 1.32 B) \alert{1.70} C) 2.07 D) 2.82 ## Confidence interval **Calculate the interval, and interpret it in context** \pause \centering{$\text{point estimate} \pm ME$} \pause \begin{eqnarray*} (\bar{x}_{C} - \bar{x}_{T}) \pm t^\star_{df} \times SE &=& (0.356-0.546) \pm 1.70 \times 0.0593 \\ \pause &=& -0.19 \pm 0.1008 \\ \pause &=& (-0.2908, -0.0892) \end{eqnarray*} \pause \raggedright We are 90% confident that they average reaction time of a driver not using a phone is 0.0892 to 0.2908 seconds faster than the average reaction time of a driver using a phone. ## Recap: Inference using difference of two small sample means - If $\sigma_1$ or $\sigma_2$ is unknown, difference between the sample means follow a $t$-distribution with $SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_1}}$. \pause - Conditions: - Independence within groups (often verified by a random sample, and if sampling without replacement, n < 10% of population) and between groups. - No extreme skew in either group. \pause - Hypothesis testing: \centering{$T_{df} = \frac{\text{point estimate} - \text{null value}}{SE}, \text{ where } df = min(n_1-1, n_2-1).$} \pause - Confidence interval: \centering{$\text{point estimate} \pm t^*_{df} \times SE$}