Chapter 6

```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ```{r, echo=F, message=F, warning=F} library(readr) library(openintro) library(here) library(tidyverse) data(COL) ``` # Chi-square test of GOF ## Weldon's dice \begin{multicols*}{2} \begin{itemize} \small \item Walter Frank Raphael Weldon (1860 - 1906), was an English evolutionary biologist and a founder of biometry. He was the joint founding editor of Biometrika, with Francis Galton and Karl Pearson. \item In 1894, he rolled 12 dice 26,306 times, and recorded the number of 5s or 6s (which he considered to be a success). \end{itemize} \columnbreak \includegraphics[width=0.8\columnwidth]{weldon.jpeg} \end{multicols*} \small - It was observed that 5s or 6s occurred more often than expected, and Pearson hypothesized that this was probably due to the construction of the dice. Most inexpensive dice have hollowed-out pips, and since opposite sides add to 7, the face with 6 pips is lighter than its opposing face, which has only 1 pip. ## Labby's dice \begin{multicols}{2} \begin{itemize} \item In 2009, Zacariah Labby (U of Chicago), repeated Weldon's experiment using a homemade dice-throwing, pip counting machine. \newline \url{http://www.youtube.com/watch?v=95EErdouO2w} \item The rolling-imaging process took about 20 seconds per roll. \end{itemize} \columnbreak \includegraphics[width=1\columnwidth]{labby.png} \end{multicols} - Each day there were $\sim$ 150 images to process manually. - At this rate Weldon's experiment was repeated in a little more than six full days. ## Labby's dice - Labby did not actually observe the same phenomenon that Weldon observed (higher frequency of 5s and 6s). - Automation allowed Labby to collect more data than Weldon did in 1894, instead of recording "successes" and "failures", Labby recorded the individual number of pips on each die. \begin{center} \includegraphics[width=0.6\columnwidth]{labbyPipCounts.png} \end{center} ## Expected counts \alert{Labby rolled 12 dice 26,306 times. If each side is equally likely to come up, how many 1s, 2s, ... , 6s would he expect to have observed?} A) $\frac{1}{6}$ B) $\frac{12}{6}$ C) $\frac{26,306}{6}$ D) $\frac{12 \times 26,306}{6}$ ## Expected counts \alert{Labby rolled 12 dice 26,306 times. If each side is equally likely to come up, how many 1s, 2s, ... , 6s would he expect to have observed?} A) $\frac{1}{6}$ B) $\frac{12}{6}$ C) $\frac{26,306}{6}$ D) $\textcolor{red}{\frac{12 \times 26,306}{6} = 52,612}$ ## Summarizing Labby's results \renewcommand\arraystretch{1.2} \begin{tabular}{c | c c} Outcome & Observed & Expected \\ \hline 1 & 53,222 & 52,612 \\ 2 & 52,118 & 52,612 \\ 3 & 52,465 & 52,612 \\ 4 & 52,338 & 52,612 \\ 5 & 52,244 & 52,612 \\ 6 & 53,285 & 52,612 \\ \hline Total & 315,672 & 315,672 \end{tabular} \pause \alert{Why are the expected counts the same for all outcomes but the observed counts are different? At a first glance, does there appear to be an inconsistency between the observed and expected counts?} ## Setting the hypotheses \alert{Do these data provide convincing evidence of an inconsistency between the observed and expected counts?} \pause \begin{itemize} \item[$H_0$:] There is no inconsistency between the observed and the expected counts. \textbf{The observed counts follow the same distribution as the expected counts.} \pause \item[$H_A$:] There is an inconsistency between the observed and the expected counts. \textbf{The observed counts \alert{do not} follow the same distribution as the expected counts.} There is a bias in which side comes up on the roll of a die. \end{itemize} ## Evaluating the hypotheses - To evaluate these hypotheses, we quantify how different the observed count are from the expected counts. \pause - Large deviations from what would be expected based on sampling variation (chance) alone provide strong evidence for the alternative hypothesis. \pause - This is called a **goodness of fit** test since we're evaluating how well the observed data fit the expected distribution. ## Anatomy of a test statistic - The general form of a test statistic is \centering{$\frac{point \text{ }estimate - null \text{ }value}{SE \text{ }of \text{ }point \text{ }estimate}$} \pause - This construction is based on \begin{enumerate} \item Identifying the difference between a point estimate and an expected value if the null hypothesis was true, and \item standardizing that difference using the standard error of the point estimate. \end{enumerate} \pause \raggedright These two ideas will help in the construction of an appropriate test statistic for count data. ## Chi-square statistic When dealing with counts and investigating how far the observed counts are from the expected counts, we use a new test statistic called the **chi-square** $\boldmath{\chi^2}$ **statistic**. \pause $\chi^2$ statistic \centering{$\chi^2 = sum_{i=1}^{k} \frac{(O-E)^2}{E}$ where $k =$ total number of cells} ## Calculating the chi-square statistic \renewcommand\arraystretch{1.8} \begin{tabular}{c | c c | c} Outcome & Observed & Expected & $\frac{(O - E)^2}{E}$\\ \hline 1 & 53,222 & 52,612 & $\frac{(53,222 - 52,612)^2}{52,612} = 7.07$ \\ \pause 2 & 52,118 & 52,612 & $\frac{(52,118 - 52,612)^2}{52,612} = 4.64$ \\ \pause 3 & 52,465 & 52,612 & $\frac{(52,465 - 52,612)^2}{52,612} = 0.41$ \\ \pause 4 & 52,338 & 52,612 & $\frac{(52,338 - 52,612)^2}{52,612} = 1.43$\\ \pause 5 & 52,244 & 52,612 & $\frac{(52,244 - 52,612)^2}{52,612} = 2.57$\\ \pause 6 & 53,285 & 52,612 & $\frac{(53,285 - 52,612)^2}{52,612} = 8.61$\ \\ \hline \pause Total & 315,672 & 315,672 & 24.73 \end{tabular} ## Why square? Squaring the difference between the observed and the expected outcome does two things: \pause - Any standardized difference that is squared will now be positive. \pause - Differences that already looked unusual will become much larger after being squared. \pause \alert{When have we seen this before?} ## The chi-square distribution - In order to determine if the $\chi^2$ statistic we calculated is considered unusually high or not we need to first describe its distribution. \pause - The chi-square distribution has one parameter called **degrees of freedom (df)**, which influences the shape, center, and spread of the distribution. \pause \footnotesize \alert{Remember:} So far we've seen three other continuous distributions: - Normal distribution: unimodal and symmetric with two parameters: mean and standard deviation. - T distribution: unimodal and symmetric with one parameter: defrees of freedom. - F distribution: unimodal and right skewed with two parameters: degrees of freedom or numerator (between group variance) and denominator (within group variance) ## Practice \alert{Which of the following is false?} ```{r, echo=F, message=F, warning=F, fig.width=4, fig.height=1.8,fig.align='center'} COL <- c('#225588', '#558822CC', '#88225599', '#88552266') par(mar=c(2, 0.5, 0.25, 0.5), mgp=c(2.1, 0.8, 0), las=1) x <- c(0, seq(0.0000001, 40, 0.05)) DF <- c(2.0000001, 4, 9) y <- list() for(i in 1:length(DF)){ y[[i]] <- dchisq(x, DF[i]) } plot(0, 0, type='n', xlim=c(0, 25), ylim=range(c(y, recursive=TRUE)), axes=FALSE) for(i in 1:length(DF)){ lines(x, y[[i]], lty=i, col=COL[i], lwd=3) } abline(h=0) axis(1) legend('topright', col=COL, lty=1:4, legend=paste(round(DF) ), title='Degrees of Freedom', cex=1, lwd = 3) ``` As the df increases, A) The center of the $\chi^2$ distribution increases as well. B) The variability of the $\chi^2$ distribution increases as well. C) The shape of the $\chi^2$ distribution becomes more skewed (less like a normal). ## Practice \alert{Which of the following is false?} ```{r, echo=F, message=F, warning=F, fig.width=4, fig.height=1.8,fig.align='center'} COL <- c('#225588', '#558822CC', '#88225599', '#88552266') par(mar=c(2, 0.5, 0.25, 0.5), mgp=c(2.1, 0.8, 0), las=1) x <- c(0, seq(0.0000001, 40, 0.05)) DF <- c(2.0000001, 4, 9) y <- list() for(i in 1:length(DF)){ y[[i]] <- dchisq(x, DF[i]) } plot(0, 0, type='n', xlim=c(0, 25), ylim=range(c(y, recursive=TRUE)), axes=FALSE) for(i in 1:length(DF)){ lines(x, y[[i]], lty=i, col=COL[i], lwd=3) } abline(h=0) axis(1) legend('topright', col=COL, lty=1:4, legend=paste(round(DF) ), title='Degrees of Freedom', cex=1, lwd = 3) ``` As the df increases, A) The center of the $\chi^2$ distribution increases as well. B) The variability of the $\chi^2$ distribution increases as well. C) **The shape of the** $\mathbf{\chi^2}$ **distribution becomes more skewed (less like a normal).** ## Finding areas under the chi-square curve - p-value = tail area under the chi-square distribution (as usual). \pause - For this we can use technology, or chi-square probability table. ## Finding aread under the chi-square curve \alert{Estimate the shaded area (above the cutoff value of 10) under the $\chi^2$ curve with $df=6$.} \pause ```{r, echo=TRUE} pchisq(q = 10, df = 6, lower.tail = FALSE) ``` ## Finding aread under the chi-square curve \alert{Estimate the shaded area (above the cutoff value of 10) under the $\chi^2$ curve with $df=6$.} \begin{multicols}{2} ```{r, echo=F, message=F, warning=F, out.width="100%",fig.align='center'} chiTail <- function(df, U, showdf = TRUE, axes = TRUE){ par(mar=c(2, 1, 1, 1), mgp=c(2.1, 0.8, 0), las=1) sd <- sqrt(2*df) xmax <- max(df + 4.000102*sd,U+2) x <- seq(0, xmax, 0.05) y <- dchisq(x, df) plot(x, y, type='l', axes=FALSE) if(axes == TRUE){ abline(h=0) axis(1, at=c(0,U,xmax+3), label = c(0,U,NA), cex.axis = 3) } if(axes == FALSE){ lines(c(0,xmax), c(0,0)) } if(showdf == TRUE){ text(x = 0.8*xmax, y = 0.5*max(y), paste("df =",df), cex = 3) } these <- which(x > U) X <- x[c(these[1], these, rev(these)[1])] Y <- c(0, y[these], 0) polygon(X, Y, col='#569BBD') } chiTail(df = 9, U = 17) ``` \columnbreak A) 0.05 B) 0.02 C) between 0.02 and 0.05 D) between 0.05 and 0.1 E) between 0.01 and 0.02 \end{multicols} ## Finding aread under the chi-square curve \alert{Estimate the shaded area (above the cutoff value of 10) under the $\chi^2$ curve with $df=6$.} \begin{multicols}{2} ```{r, echo=F, message=F, warning=F, out.width="100%",fig.align='center'} chiTail <- function(df, U, showdf = TRUE, axes = TRUE){ par(mar=c(2, 1, 1, 1), mgp=c(2.1, 0.8, 0), las=1) sd <- sqrt(2*df) xmax <- max(df + 4.000102*sd,U+2) x <- seq(0, xmax, 0.05) y <- dchisq(x, df) plot(x, y, type='l', axes=FALSE) if(axes == TRUE){ abline(h=0) axis(1, at=c(0,U,xmax+3), label = c(0,U,NA), cex.axis = 3) } if(axes == FALSE){ lines(c(0,xmax), c(0,0)) } if(showdf == TRUE){ text(x = 0.8*xmax, y = 0.5*max(y), paste("df =",df), cex = 3) } these <- which(x > U) X <- x[c(these[1], these, rev(these)[1])] Y <- c(0, y[these], 0) polygon(X, Y, col='#569BBD') } chiTail(df = 9, U = 17) ``` \columnbreak A) 0.05 B) 0.02 C) \textbf{between 0.02 and 0.05} D) between 0.05 and 0.1 E) between 0.01 and 0.02 \end{multicols} ```{r, echo=T} pchisq(q = 17, df = 9, lower.tail = FALSE) ``` ## Finding aread under the chi-square curve \alert{Estimate the shaded area (above 30) under the $\chi^2$ curve with $df = 10$.} \begin{multicols}{2} ```{r, echo=F, message=F, warning=F, out.width="100%",fig.align='center'} chiTail(df = 10, U = 30) ``` \columnbreak A) greater than 0.3 B) between 0.005 and 0.001 C) less than 0.001 D) greater than 0.001 E) cannot tell using this table \end{multicols} ## Finding aread under the chi-square curve \alert{Estimate the shaded area (above 30) under the $\chi^2$ curve with $df = 10$.} \begin{multicols}{2} ```{r, echo=F, message=F, warning=F, out.width="100%",fig.align='center'} chiTail(df = 10, U = 30) ``` \columnbreak A) greater than 0.3 B) between 0.005 and 0.001 C) \textbf{less than 0.001} D) greater than 0.001 E) cannot tell using this table \end{multicols} ```{r, echo=TRUE} pchisq(q = 30, df = 10, lower.tail = FALSE) ``` ## Back to Labby's dice \begin{itemize} \item The research question was: Do these data provide convincing evidence of an inconsistency between the observed and expected counts? \pause \item The hypotheses were: \begin{itemize} \item[$H_0$:] There is no inconsistency between the observed and the expected counts. The observed counts follow the same distribution as the expected counts. \item[$H_A$:] There is an inconsistency between the observed and the expected counts. The observed counts \alert{do not} follow the same distribution as the expected counts. There is a bias in which side comes up on the roll of a die. \end{itemize} \pause \item We had calculated a test statistic of \textbf{$\chi^2 = 24.67$}. \pause \item All we need is the $df$ and we can calculate the tail area (the p-value) and make a decision on the hypotheses. \end{itemize} ## Degrees of freedom for a goodness of fit test - When conducting a goodness of fit test to evaluate how well the observed data follow an expected distribution, the degrees of freedom are calculated as the number of cells ($k$) minus 1. \centering{$\mathbf{df = k - 1}$} \pause - For dice outcomes, k = 6, therefore \centering{$df = 6 -1 = 5$} ## Finding a p-value for a chi-square test The **p-value** for a chi-square test is defined as the **tail area above the calculated test statistic**. \begin{multicols}{2} ```{r, echo=F, message=F, warning=F, out.width="100%",fig.align='center'} chiTail(df = 5, U = 24.67) ``` \columnbreak p-value $= P(\chi^2_{df=5} > 24.67)$ is less than 0.001 \end{multicols} ## Conclusion of the hypothesis test \alert{We calculated a p-value less than 0.001. At 5\% significance level, what is the conclusion of the hypothesis test?} A) Reject $H_0$, the data provide convincing evidence that the dice are fair. B) Reject $H_0$, the data provide convincing evidence that the dice are biased. C) Fail to reject $H_0$, the data provide convincing evidence that the dice are fair. D) Fail to reject $H_0$, the data provide convincing evidence that the dice are biased. ## Conclusion of the hypothesis test \alert{We calculated a p-value less than 0.001. At 5\% significance level, what is the conclusion of the hypothesis test?} A) Reject $H_0$, the data provide convincing evidence that the dice are fair. B) **Reject** $\mathbf{H_0}$**, the data provide convincing evidence that the dice are biased.** C) Fail to reject $H_0$, the data provide convincing evidence that the dice are fair. D) Fail to reject $H_0$, the data provide convincing evidence that the dice are biased. ## Turns out... - The 1-6 axis is consistently shorter than the other two (2-5 and 3-4), thereby supporting the hypothesis that the faces with one and six pips are larger than the other faces. - Pearson's claim that 5s and 6s appear more often due to the carved-out pips is not supported by these data. - Dice used in casinos have flush faces, where the pips are filled in with a plastic of the same density as the surrounding material and are precisely balance. \begin{multicols}{2} \includegraphics[width=1\columnwidth]{regular.jpeg} \columnbreak \includegraphics[width=1\columnwidth]{casino.jpeg} \end{multicols} ## Recap: p-value for a chi-square test - The p-value for a chi-square test is define as the tail area **above** the calculated test statistic. - This is because the test statistic is always positive, and a higher test statistic means a stronger deviation from the null hypothesis. ```{r, echo=F, message=F, warning=F, fig.width=4.5, fig.height=2.4,fig.align='center'} chiTail(df = 6, U = 10, showdf = FALSE, axes = FALSE) text(x = 12, y = 0.005, "p-value", col = "red", cex = 1) ``` ## Conditions for the chi-square test \begin{enumerate} \item \textbf{Independence:} Each case that contributes a count to the table must be independent of all the other cases in the table. \pause \item \textbf{Sample size:} Each particular scenario (i.e. cell) must have at least 5 \alert{expected} cases. \pause \item \textbf{df $>$ 1:} Degrees of freedom must be greater than 1. \end{enumerate} \pause Failing to check conditions may unintentionally affect the test's error rates. ## 2009 Iran Election \alert{There was lots of talk of election fraud in the 2009 Iran election. We'll compare the data from a poll conducted before the election (observed data) to the reported votes in the election to see if the two follow the same distribution.} \begin{center} \begin{tabular}{l | r r} & \footnotesize{Observed \# of} & \footnotesize{Reported \% of} \\ \footnotesize{Candidate} & \footnotesize{voters in poll} & \footnotesize{votes in election} \\ \hline (1) Ahmedinajad & 338 & 63.29\% \\ (2) Mousavi & 136 & 34.10\% \\ (3) Minor candidates & 30 & 2.61\% \\ \hline Total & 504 & 100\% \\ \pause & \alert{$\downarrow$} & \alert{$\downarrow$} \\ & \alert{observed} & \alert{expected} \\ & & \alert{distribution} \end{tabular} \end{center} ## Hypotheses \alert{What are the hypotheses for testing if the distributions of reported and polled votes are different?} \pause \begin{itemize} \item[$H_0$:] The observed counts from the poll follow the same distribution as the reported votes. \item[$H_A$:] The observed counts from the poll do not follow the same distribution as the reported votes. \end{itemize} ## Calculation of the test statistic \small \begin{center} \begin{tabular}{l | r r r} & \footnotesize{Observed \# of} & \footnotesize{Reported \% of} & \footnotesize{Expected \# of} \\ \footnotesize{Candidate} & \footnotesize{voters in poll} & \footnotesize{votes in election} & \footnotesize{votes in poll} \\ \hline \footnotesize{(1) Ahmedinajad} & 338 & 63.29\% & 504 $\times$ 0.6329 = 319 \\ \footnotesize{(2) Mousavi} & 136 & 34.10\% & 504 $\times$ 0.3410 = 172 \\ \footnotesize{(3) Minor candidates} & 30 & 2.61\% & 504 $\times$ 0.0261 = 13\\ \hline Total & 504 & 100\% & 504 \end{tabular} \end{center} \pause \begin{eqnarray*} \frac{(O_1 - E_1)^2}{E_1} = \frac{(338 - 319)^2}{319} &=& 1.13 \\ \pause \frac{(O_2 - E_2)^2}{E_2} = \frac{(136 - 172)^2}{172} &=& 7.53 \\ \pause \frac{(O_3 - E_3)^2}{E_3} = \frac{(30 - 13)^2}{13} &=& 22.23 \\ \pause \chi^2_{\alert{df = 3 - 1 = 2}} &=& 30.89 \end{eqnarray*} ## Conclusion \alert{Based on these calculations what is the conclusion of the hypothesis test?} A) p-value is low, $H_0$ is rejected. The observed counts from the poll do \underline{not} follow the same distribution as the reported votes. B) p-value is high, $H_0$ is not rejected. The observed counts from the poll follow the same distribution as the reported votes. C) p-value is low, $H_0$ is rejected. The observed counts from the poll follow the same distribution as the reported votes. D) p-value is low $H_0$ is not rejected. The observed counts from the poll do \underline{not} follow the same distribution as the reported votes. ## Conclusion \alert{Based on these calculations what is the conclusion of the hypothesis test?} A) \textbf{p-value is low,} $\mathbf{H_0}$ \textbf{is rejected. The observed counts from the poll do \underline{not} follow the same distribution as the reported votes.} B) p-value is high, $H_0$ is not rejected. The observed counts from the poll follow the same distribution as the reported votes. C) p-value is low, $H_0$ is rejected. The observed counts from the poll follow the same distribution as the reported votes. D) p-value is low $H_0$ is not rejected. The observed counts from the poll do \underline{not} follow the same distribution as the reported votes.