Chapter 1

```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ```{r, echo=F, message=F, warning=F} library(datasets) library(tidyverse) library(shiny) library(scales) library(jpeg) library(openintro) library(dplyr) library(ggplot2) library(learnr) library(readr) library(knitr) library(png) library(gradethis) #remotes::install_github("rstudio/gradethis") library(learnrhash) #devtools::install_github("rundel/learnrhash") library(grid) library(tinytex) data("COL") ``` ## Acknowledgement

These notes use content from OpenIntro Statistics Slides by Mine Cetinkaya-Rundel.

## Outline of the Chapter

In this introductory chapter, we will 1) Begin with a case study, use the case study to discuss the concepts of - Research question - Statistic method to find the answer to the question - Proportion (percentage) of interested item 2) Discuss data basics, variables and types 3) Introduce basic data collection techniques 4) Discuss aspects about experiments (case studies)

# 1.1 Case Study ## Treating Chronic Fatigue Syndrome

- Objective: Evaluate the effectiveness of cognitive-behavior therapy for chronic fatigue syndrome. - Participant pool: 142 patients who were recruited from referrals by primary care physicians and consultants to a hospital clinic specializing in chronic fatigue syndrome. - Actual participants: Only 60 of the 142 referred patients entered the study. Some were excluded because they didn’t meet the diagnostic criteria, some had other health issues, and some refused to be a part of the study.

## Study Design

- Patients randomly assigned to treatment and control groups, 30 patients in each group: - **Treatment**: Cognitive behavior therapy - collaborative, educative, and with a behavioral emphasis. Patients were shown on how activity could be increased steadily and safely without exacerbating symptoms. - **Control**: Relaxation - No advice was given about how activity could be increased. Instead progressive muscle relaxation, visualization, and rapid relaxation skills were taught.

## Results The table below shows the distribution of patients with good outcomes at 6-month follow-up. Note that 7 patients dropped out of the study: 3 from the treatment and 4 from the control group. $$ \begin{align} && && {Good Outcome} \\ \hline && && Yes && No && Total \\ \hline {Group} && Treatment && 19 && 8 && 27 \\ && Control && 5 && 21 && 26 \\ \hline && Total && 24 && 29 && 53 \end{align} $$ ```{r} ``` ## Results The table below shows the distribution of patients with good outcomes at 6-month follow-up. Note that 7 patients dropped out of the study: 3 from the treatment and 4 from the control group. $$ \begin{align} && && {Good Outcome} \\ \hline && && Yes && No && Total \\ \hline {Group} && Treatment && 19 && 8 && 27 \\ && Control && 5 && 21 && 26 \\ \hline && Total && 24 && 29 && 53 \end{align} $$ - Proportion with good outcomes in treatment group: ```{r Ex1_1, exercise=TRUE} ``` ## Results The table below shows the distribution of patients with good outcomes at 6-month follow-up. Note that 7 patients dropped out of the study: 3 from the treatment and 4 from the control group. $$ \begin{align} && && {Good Outcome} \\ \hline && && Yes && No && Total \\ \hline {Group} && Treatment && 19 && 8 && 27 \\ && Control && 5 && 21 && 26 \\ \hline && Total && 24 && 29 && 53 \end{align} $$ - Proportion with good outcomes in treatment group: $19/27 \approx 0.70 \rightarrow 70\%$ - Proportion with good outcomes in control group: ```{r Ex1_2, exercise=TRUE} ``` ## Results The table below shows the distribution of patients with good outcomes at 6-month follow-up. Note that 7 patients dropped out of the study: 3 from the treatment and 4 from the control group. $$ \begin{align} && && {Good Outcome} \\ \hline && && Yes && No && Total \\ \hline {Group} && Treatment && 19 && 8 && 27 \\ && Control && 5 && 21 && 26 \\ \hline && Total && 24 && 29 && 53 \end{align} $$ - Proportion with good outcomes in treatment group: $$19/27 \approx 0.70 \rightarrow 70\%$$ - Proportion with good outcomes in control group: $$5/26 \approx 0.19 \rightarrow 19\%$$ ## Understanding the results

Do the data show a "real" difference between the groups?

## Understanding the results

Do the data show a "real" difference between the groups?

- Suppose you flip a coin 100 times. While the chance a coin lands heads in any given coin flip is 50%, we probably wont observe exactly 50 heads. This type of fluctuation is part of almost any type of data generating process. - The observed difference between the two group ($70\% - 19\% = 51\%$) may be real, or may be due to natural variation. - Since the difference is quite large, it is more believable that the difference is real. - We need statistical tools to determine if the difference is so large that we should reject the notion that it was due to chance.

## Generalizing the results

Are the results of this study generalizable to all patients with chronic fatigue syndrome?

## Generalizing the results

Are the results of this study generalizable to all patients with chronic fatigue syndrome?

Introduction to Probability & Statistics