Working R with github

You need to install R and RStudion on your computers. We will go through the following steps to install two free programs. Also, you’ll understand basic R commands, and the RStudio interface with R in order to start programming.

1. Using Rstudio to interact with R

1.1 Installing R and RStudio

R and RStudio are two separate pieces of software:

In this class we will use RStudio to interact with R. The following steps outline a simple and effective process for installing R and RStudio on your computers.

Windows

MacOS

Linux

If R and RStudio are installed, determine whether your R and RStudio versions are necessary to up to date.

1.2 Updating R and RStudio

Updating R

Updating RStudio

To update RStudio, open RStudio and click on Help > Check for Updates. If a new version is available, RStudio will automatically notify you every once in a while.

1.3 Setting up RStudio Cloud

If it isn’t feasible to install R and RStudio Desktop on your computer, you can use RStudio Cloud and run R in an online browser window. For this purpose, you need to sign up and create a new account of RStudio Cloud.

Once log in, you should look for a few things. On the left hand side, you should see column that displays your “Spaces”, you can check for Learning, as well as some additional info on the system status and terms and conditions.

On the right hand side, you should see a small chart showing your Account Usage. This is what you want to keep track of. Depending on how much time you spend actually running code, your time will vary, but the standard free account provides 15 hours per month.

Getting Setup

1.4 Installing R packages

Most of the work in R is done by basic functions, which are wrapped into packages. Except for basic functions, R packages have build-in sample data sets.

By default, a set of R packages is installed during R installation. There also are many packages that are needed to install later from the central repositories like CRAN or Bioconductor, as well as developer repositories like R-Forge(https://r-forge.r-project.org/) or GitHub.

Installing R packages

To install R packages, open RStudio and copy and paste the following command into the console window, then execute the command.

install.packages("tidyverse")
install.packages("RSQLite")
# Alternatively
install.packages(c("tidyverse", "RSQLite"))
install.packages ("devtools")
devtools::install_github ("grssnbchr/hexbin")
#install.packages("patchwork", repos="http://R-Forge.R-project.org")

Alternatively, you can install the packages using RStudio interface by going to Tools > Install Packages and typing the names of the packages separated by a comma.

When the installation has finished, you can try to load the packages by pasting the following code into the console:

Updating R packages

It is recommended to keep your R version and all packages up to date. To update the packages that you have installed, click Update in the Packages tab in the bottom right panel of RStudio, or go to Tools > Check for Package Updates....

2. Creating a R project

2.1 Using a project to organize your work

When using R several years ago, it is ususally to first set a working directory using setwd(), which takes an absolute file path as an input and sets it as the current working directory of the R process, and to then use getwd() for finding out whether the current working directory is correctly set. The problem with this approach is that since setwd() relies on an absolute file path. This makes the links break very easily, and very difficult to share your analysis with others.

An RStudio project solves the problem of ‘fragile’ file paths by making file paths relative. An RStudio projects is the file that sits in the root directory, with the extension .Rproj. When your RStudio session is running through .Rproj, the current working directory points to the root folder where that .Rproj file is saved.

This .Rproj file can be created by going to File > New Project… in RStudio, which is then link a specified folder or directory that is stand-alone and portable. You can reading data from or writing data to files within this directory, except for cases where your analysis requires interacting with an Internet data source, such as web-scraping. When opening an existing project, you will open the .Rproj file and subsequently open R scripts (extensions with .R) from the RStudio session, rather than going to the R scripts to open them. There are lots of documents on RStudio projects, which have detail information on .RData and .Rhistory files.

2.2 Working directory structure

This directory structure ‘template’ can provide a good starting point for organizing projects if workflow is new to you. However, different projects will have different needs, and thus one should think about what is needed and what will happen while setting up the working directory structure. A template of an R project like this:

The data folder

The data folder is is the subfolder where data are stored. They include any source files, such as SPSS, Excel/CSV or .RDS., and some generated ones. Someone would like to split the subfolder into three parts:

The src folder

This folder stores R script files (with the extension .R) and Rmarkdown ones (with the extension .Rmd) for data analysis and visualization. There are three types of R scripts:

The output folder

In Output folder, save all your outputs here, including plots, HTML, and data exports.

2.3 General settings

2.4 Automatically running

There is a main runner script or potentially a small number. These go through scripts in a sequence. It is a sensible idea in such a case to establish sequential subfolders or sequentially numbered scripts that are executed in sequence. Typically, this model performs better if there are at most a handful distinct pipelines.

3. Connecting a R project to Github

3.1 Checkinf that RStudio can find Git

The first task is to ensure that Git can be located by RStudio on your machine. To do this, open RStudio and go to Tools > Global Options > Git/SVN.

Under “Git executable”, you should be able to see a path to Git. Take ubuntu as example, it will be in /usr/bin/git.

If Git is not in this location or you want to check where the Git executable path is, open the Command Prompt in ubuntu or Windows terminal. Type where git to reveal the Git executable file path.

If it doesn’t match the dialogue box in RStudio, click on “Browse…” and navigate to your Git executable file. Once complete, press “OK”.

3.2 Adding a new R project to GitHub

If you want to start a new RStudio project and have it backed-up on GitHub, follow the following steps:

3.3 Adding an existing R project to GitHub

If you have an existing project in RStudio and decide later that maintaining version control in GitHub would be a good idea, follow the steps:

# git remote add origin https://github.com/yourusername/yourrepo
# git pull origin main
# git push -u origin main

4. Writing R code

4.1 Using script

Using R understand language to tell R what and how do the thing that you want. We refer this discription to “script”.

Creating R script

A script is simply a text file that contains a set of commands and comments. It can be saved and reused later. It can also be edited so you can execute a modified version of the commands.

To create a new script in RStudio, you can open a new empty script by clicking the New File–>New File Menu–>R Script. The script editor opens with an empty script, which is ready for text entry. Here is an example to familiarize you with the Script Editor interface.

Saving R script

You can save your script by clicking on the Save icon at the top of the Script Editor panel. When you do that, a Save File dialog will open.

The default script name is Untitled.R. The Untitled part is highlighted. You will save this script as First script.R. Start typing First script. RStudio overwrites the highlighted default name with your new name, leaving the .R file extension.

Notice that RStudio will save your script to your current working folder. Press the Save button and your script is saved to your working folder. Notice that the name in the file at the top of the Script Editor panel now shows your saved script file name.

While it is not necessary to use an .R file extension for your R scripts, it does make it easier for RStudio to work with them if your use this file extension.

Opening R script

Click on the Open an existing file icon in the RStudio toolbar. A choose file dialog will open. Select the R script you want to open and click the Open button. Your script will open in the Script Editor panel with the script name in an editor tab.

Commenting R script

In scripts, it can be very useful to save a bit of text which is not to be evaluated by R. You can leave a note to yourself about what the next line is supposed to do, what its strengths and limitations are, or anything else you want to remember later. To leave a note, we use “comments”, which are a line of text that starts with the hash symbol #. Anything on a line after a # will be ignored by R.

# This is a comment. Running this in R will have no effect.

Executing R script

The Run button in the Script Editor panel toolbar will run either the current line of code or any block of selected code. You can use your First script.R code to gain familiarity with this functionality.

Place the cursor anywhere in line 3 of your script [x = 34]. Now press the Run button in the Script Editor panel toolbar. Three things happen: 1) the code is transferred to the command console, 2) the code is executed, and 3) the cursor moves to the next line in your script. Press the Run button three more times. RStudio executes lines 4, 5, and 6 of your script.

4.2 Using rmarkdown

Learn how to construct an RMarkdown file, please visit the site.

4.3 Objects and operators

R objects

In RStudio, if you want to create a object called x and give it a value of 4 using the symbol “<-”, we would write:

x <- 4
x
[1] 4

The middle “<-” tells R to assign the value on the right to the object on the left. After running the command above, when running x in a command, it would be replaced by its value 4. If adding 3 to x, the expect would get 7.

x + 3
[1] 7

Object in R can store more than just simple numbers. It can store lists of numbers, functions, graphics, etc., depending on what values get assigned to the object.

You can always reassign a new value to a object. If telling R that x is equal to 32:

x <- 32

then x takes its new value:

x
[1] 32

Naming objects and functions in R is pretty flexible. A name has to start with a letter, but that can be followed by letters or numbers. Names in R are case-sensitive, which means that Weights and weights are completely different things to R. A good idea is to give object names be as descriptive as possible, so that you will know what you meant later on when looking at it. Sometimes clear naming means that it is best to have multiple words in the name, but you can’t have spaces. A common approach is to chain the words with underscores, as in weights_before_hospital.

R operators

An operator is a symbol that tells the compiler to perform specific mathematical or logical manipulations. R language is rich in built-in operators and provides following types of operators.