Tutorial 2: An Introduction to R
```{r setup, include=FALSE} #knitr::opts_chunk$set(eval = FALSE) library(dplyr) library(ggplot2) library(learnr) library(gradethis) #remotes::install_github("rstudio/gradethis") library(learnrhash) #devtools::install_github("rundel/learnrhash") tutorial_options(exercise.timelimit = 60, exercise.checker = gradethis::grade_learnr) ``` ## Why R? If you've never coded before (or even if you have), type `"Your Name"` in the interactive R chunk below and run it by hitting `crtl+Enter` or `cmd+Enter` for MAC users. ```{r Student-Name, exercise = TRUE} ``` Through these tutorials you'll be interacting with data using the R programming language. Alongside Python, R is one of the most widely used tools for statistics and data science. This tutorial is a first introduction to embedded R chunks. ### You might be asking why do I need to use R rather than a different software like Minitab, SPSS, SAS, or a number of others. Consider the following three points. ### + R is *open source* -- you are guaranteed that it will always be available (and free). + Since R is a language rather than a fully developed piece of software with a point and click interface, you aren't limited to what a team of developers though you might want to do at the time the software was developed. With R, you can do virtually anything you want (although I admit that is easier said than done). + We live in a world where we are constantly interacting with software. It isn't a bad thing to learn what code looks like -- plus, you'll be learning a transferable skill in addition to learning statistics. ### This tutorial will introduce you to interacting with R through executable code chunks. Whenever you see a code chunk, feel free to play around with any code that appears in the chunk and see how your changes impact the results of the code. You can execute code chunks by hitting the run button, using `ctrl+Enter`, or `cmd+Enter`. If you see a *Submit Answer* button, then you are interacting with an exercise that can give you feedback. Click *Submit Answer* rather than the *Run* button to run your code and see if you got it right. Don't worry if you got it wrong, you can always try again! ### Try editing the following chunk to execute `2 + 3` rather than `2 + 2`, as is preset. ```{r a-first-operation, exercise = TRUE, exercise.eval = TRUE} #We can use R to add, subtract, multiply, and divide 2 + 2 ``` ```{r a-first-operation-check} grade_result( pass_if(~ (.result == 2 + 3)) ) ``` ### A few warnings Coding can be frustrating and you are certain to run into errors. Here are a few common things to watch out for. + Spelling counts! If you are getting an error, check to see that you've spelled everything correctly, including both function and object names. + Capitalization counts! Again, check to see that everything is typed exactly as it should be. + Ask for help! + For most functionality, R has built-in help documentation. You can find help on a function, package, or dataset by typing a question mark (?) followed immediately by the package name and executing that line. For example, running `?max` will generate the help vignette for R's `max()` function. Unfortunately the help documentation won't populate within these interactive tutorials, but if you go back to a local R instance, then running these commands from the prompt (`>`) in the console will work as described. + If you are struggling with troubleshooting your code, ask a teacher, mentor, or friend. Speaking from experience, it is really hard to identify small issues like a mis-placed comma in code. A fresh set of eyes often does the trick. + Google is your friend here. If you are having an issue, chances are that someone else has had the same issue. The website *StackExchange* will likely become one of your most frequently visited sites. + If you find yourself smashing keys and getting frustrated, it is time for a break. Those fresh eyes don't always need to be somebody else's -- coding is hard work and takes time, but with patience and practice you'll improve quickly. ### Hello World! If you've never coded before (or even if you have), type `print("Hello World!")` in the interactive R chunk below and run it by hitting `crtl+Enter` or `cmd+Enter` for MAC users. ```{r HelloWorld, exercise = TRUE} ``` ```{r HelloWorld-solution} print("Hello World!") ``` ```{r HelloWorld-check} grade_code() ``` ## An R Primer The following subsections constitute a primer for working in R. You'll be exposed to R as a simple calculator and also as a tool for interfacing with data frames. Think of data frames (sometimes called data matrices or data arrays) as tables of data, just like Excel Spreadsheets. We'll start by working with R as a calculator though, which you've actually done already in this tutorial -- when you changed the code cell to evaluate `2 + 3` instead of `2 + 2`. ### ### R can be used as a Calculator You can (and will) use R to perform basic calculation tasks. You will not need the use of an additional calculator for this course -- R is capable of everything your old calculator was and also much, much more. Use the code block below to experiment. ```{r simpleOperations, exercise = TRUE, exercise.eval = TRUE} #As you've seen, we can use R to add, subtract, multiply, and divide 1 + 1 - (5*2) + (8/4) ``` ### Use the code block below to evaluate the expression `4^3 - 16`. ```{r arithmeticExercise, exercise = TRUE} ``` ```{r arithmeticExercise-solution} 4^3 - 16 ``` ```{r arithmeticExercise-check} grade_result( pass_if(~ identical(.result, 4^3 - 16)) ) ``` ### R has built in functions to handle operations like the square root (`sqrt()`), logarithm (`log()`), etc. Google is your friend here -- if you need to carry out an operation, do a quick internet search to discover how you can call it in R. We can also piece together more complicated expressions in R by making use of parentheses `()` for grouping. Beware that none of the other brackets (square brackets or curly braces) may be used for grouping since they have special meaning in R -- we'll see them in action later in our course. For now, use the code block below to evaluate $\displaystyle{\frac{54 - 50}{8/\sqrt{20}}}$. ```{r testStatEval, exercise = TRUE} ``` ```{r testStatEval-solution} (54 - 50)/(8/sqrt(20)) ``` ```{r testStatEval-check} grade_result( pass_if(~ identical(.result, (54 - 50)/(8/sqrt(20)))) ) ``` ### Add parentheses to the expression below so that it evaluates to `1`. ```{r parensExercise1, exercise = TRUE, exercise.eval = TRUE} 8 - 6 / 2 ``` ```{r parensExercise1-solution} (8 - 6)/2 ``` ```{r parensExercise1-check} grade_result( pass_if(~ (.result == 1)) ) ``` Add parentheses to the expression below so that it evaluates to `0`. ```{r parensExercise2, exercise = TRUE, exercise.eval = TRUE} 5*3*4 - 12 ``` ```{r parensExercise2-solution} 5*(3*4 - 12) ``` ```{r parensExercise2-check} grade_result( pass_if(~ (.result == 0)) ) ``` ### Were you able to get the code to evaluate to 1 just by introducing parentheses? If not, check the solution code for a possible answer. ## Defining Objects in R In R we can define objects via the `<-` operator. For example, we can define the value 4 to the container `x` using `x <- 4`. Use the code block below to change the value stored in `x` from 4 to `-22`. ```{r redefine, exercise = TRUE, exercise.eval = TRUE} x <- 4 ``` ```{r redefine-solution} x <- -22 ``` ```{r redefine-check} grade_code() ``` ### Use the code block below to store the value `-3` in the container `y` and then compute the quantity `x*y - y`. ```{r defineAndCompute-setup} x <- -22 ``` ```{r defineAndCompute, exercise = TRUE} ``` ```{r defineAndCompute-solution} y <- -3 x*y - y ``` ```{r defineAndCompute-check} grade_code() ``` ### In R, it is helpful to think of objects as either vectors (lists of objects) or data frames (tables), because that's how R treats its objects. We can create an object called `numVec` which is a vector of six numerical values (1, 8, -2, 2, -99, 43) using `numVec <- c(1, 8, -2, 2, -99, 43)`. Any time we wish to reference or provide a list of items in R, we use the `c()` operator to *combine* those items. Once we have a vector, we can operate with it. ```{r vector-operations} numVec <- c(1, 8, -2, 2, -99, 43) numVec + c(1, 1, 1, 0, -2, 4) 1 + numVec 2*numVec ``` ### Notice that we can add two vectors of the same length to one another and the addition is completed coordinatewise. We can also add a single value to a vector and that single value is *recycled* and added to each component of the vector (each of the values). The same works for multiplication or division (and subtraction too). While this behavior may seem convenient (and it is), it does require that we be aware of a few possible pitfalls. If we try to add two vectors that do not have the same length, R will indeed perform the operation but will warn us if the shorter vector is not *recycled* a full number of times while the operation is completed. If the recycling works without issue, then R does not report a warning to us. This can be problematic, because typically we don't want to add vectors of different sizes -- we need to pay close attention. ```{r vecRecycle, eval = TRUE} numVec1 <- c(1, 2, 9, 7, 3, 4) numVec2 <- c(2, 9, -2, 4) numVec3 <- c(1, 7) numVec1 + numVec2 numVec1 + numVec3 ``` ## Data Frames We typically aren't just working with scalars (single values) or a handful of vectors. Usually a statistician, analyst, or data scientist will be working with a table (or many tables) filled with raw data. We'll call these tables *data frames*, and you can see the first six rows of the `diamonds` data frame (from our first tutorial) below. ```{r data-frame-diamonds} head(diamonds) ``` ### ### We can call a column of a dataframe in several different ways. For example, `diamonds$cut`, `diamonds[ , "cut"]`, and `diamonds[ , 3]` all access the `cut` column of the `diamonds` data frame but the first method returns a vector of values while the two methods making use of brackets return a data frame. Alter the code below so that you access the `carat` column as a vector instead of the `cut` column as a vector. ```{r columnAccess, exercise = TRUE, exercise.eval = TRUE} diamonds$cut ``` ```{r columnAccess-solution} diamonds$carat ``` ```{r columnAccess-check} grade_code() ``` ## Submit ```{r context="server"} learnrhash::encoder_logic(strip_output = TRUE) ``` ```{r encode, echo=FALSE} learnrhash::encoder_ui( ui_before = shiny::div( "If you have completed this tutorial and are happy with all of your", "solutions, please click the button below to generate your hash and", "submit it using the corresponding tutorial assignment tab on Blackboard", shiny::tags$br() ), ui_after = shiny::tags$a(href = "https://blackboard.ncat.edu/", "NCAT Blackboard") ) ``` ## Summary Okay, that's enough of an introduction for now. Here's a quick recap. + R can be used as a simple calculator -- in the same way as any hand-held calculator you've used in your previous coursework. + When doing multiplication in R, we must include the asterisk (`*`), otherwise R will throw an error. For example, `5(4)` results in an error stating that you asked R to apply a non-function to the value 4, but `5*(4)` results in 20, as expected. + The real advantage with R comes in interacting with large datasets (called data frames). + A *data frame* is just like a single sheet of an Excel or Google Sheets file. + A data frame has rows and columns, and we can access columns of a data frame in multiple ways. The most common is with the `$` operator, which returns the column as a vector (an ordered list of values). We'll learn more about R as we progress through these tutorials. Remember, you are **not** expected to have a programming background as a prerequisite -- stumbling a bit with the coding is to be expected.