You need to install R and RStudion on your computers. We will go through the following steps to install two free programs. Also, you’ll understand basic R commands, and the RStudio interface with R in order to start programming.
R and RStudio are two separate pieces of software:
In this class we will use RStudio to interact with R. The following steps outline a simple and effective process for installing R and RStudio on your computers.
.exe
file that was just downloaded..pkg
file for the latest R version. Double click on the downloaded file to install R.sudo apt-get install r-base
for Debian/Ubuntu and sudo yum install R
for Fedora, but the installed versions of R are usually out of date.sudo dpkg -i rstudio-YYYY.MM.X-ZZZ-amd64.deb
for Debian/Ubuntu at the terminal.If R and RStudio are installed, determine whether your R and RStudio versions are necessary to up to date.
Tools > Global Options > General > Basic
installr
that can help you with upgrading your R version and migrate your package library.To update RStudio, open RStudio and click on
Help > Check for Updates
. If a new version is available, RStudio will automatically notify you every once in a while.
If it isn’t feasible to install R and RStudio Desktop on your computer, you can use RStudio Cloud and run R in an online browser window. For this purpose, you need to sign up and create a new account of RStudio Cloud.
Once log in, you should look for a few things. On the left hand side, you should see column that displays your “Spaces”, you can check for Learning, as well as some additional info on the system status and terms and conditions.
On the right hand side, you should see a small chart showing your Account Usage. This is what you want to keep track of. Depending on how much time you spend actually running code, your time will vary, but the standard free account provides 15 hours per month.
Most of the work in R is done by basic functions, which are wrapped into packages. Except for basic functions, R packages have build-in sample data sets.
By default, a set of R packages is installed during R installation. There also are many packages that are needed to install later from the central repositories like CRAN or Bioconductor, as well as developer repositories like R-Forge(https://r-forge.r-project.org/) or GitHub.
To install R packages, open RStudio and copy and paste the following command into the console window, then execute the command.
install.packages("tidyverse")
install.packages("RSQLite")
# Alternatively
install.packages(c("tidyverse", "RSQLite"))
install.packages ("devtools")
devtools::install_github ("grssnbchr/hexbin")
#install.packages("patchwork", repos="http://R-Forge.R-project.org")
Alternatively, you can install the packages using RStudio interface by going to Tools > Install Packages
and typing the names of the packages separated by a comma.
When the installation has finished, you can try to load the packages by pasting the following code into the console:
It is recommended to keep your R version and all packages up to date. To update the packages that you have installed, click Update
in the Packages
tab in the bottom right panel of RStudio, or go to Tools > Check for Package Updates...
.
When using R several years ago, it is ususally to first set a working directory using setwd(), which takes an absolute file path as an input and sets it as the current working directory of the R process, and to then use getwd() for finding out whether the current working directory is correctly set. The problem with this approach is that since setwd() relies on an absolute file path. This makes the links break very easily, and very difficult to share your analysis with others.
An RStudio project solves the problem of ‘fragile’ file paths by making file paths relative. An RStudio projects is the file that sits in the root directory, with the extension .Rproj. When your RStudio session is running through .Rproj, the current working directory points to the root folder where that .Rproj file is saved.
This .Rproj file can be created by going to File > New Project… in RStudio, which is then link a specified folder or directory that is stand-alone and portable. You can reading data from or writing data to files within this directory, except for cases where your analysis requires interacting with an Internet data source, such as web-scraping. When opening an existing project, you will open the .Rproj file and subsequently open R scripts (extensions with .R) from the RStudio session, rather than going to the R scripts to open them. There are lots of documents on RStudio projects, which have detail information on .RData and .Rhistory files.
This directory structure ‘template’ can provide a good starting point for organizing projects if workflow is new to you. However, different projects will have different needs, and thus one should think about what is needed and what will happen while setting up the working directory structure. A template of an R project like this:
The data folder is is the subfolder where data are stored. They include any source files, such as SPSS, Excel/CSV or .RDS., and some generated ones. Someone would like to split the subfolder into three parts:
This folder stores R script files (with the extension .R) and Rmarkdown ones (with the extension .Rmd) for data analysis and visualization. There are three types of R scripts:
In Output folder, save all your outputs here, including plots, HTML, and data exports.
There is a main runner script or potentially a small number. These go through scripts in a sequence. It is a sensible idea in such a case to establish sequential subfolders or sequentially numbered scripts that are executed in sequence. Typically, this model performs better if there are at most a handful distinct pipelines.
The first task is to ensure that Git can be located by RStudio on your machine. To do this, open RStudio and go to Tools > Global Options > Git/SVN.
Under “Git executable”, you should be able to see a path to Git. Take ubuntu as example, it will be in /usr/bin/git.
If Git is not in this location or you want to check where the Git executable path is, open the Command Prompt in ubuntu or Windows terminal. Type where git to reveal the Git executable file path.
If it doesn’t match the dialogue box in RStudio, click on “Browse…” and navigate to your Git executable file. Once complete, press “OK”.
If you want to start a new RStudio project and have it backed-up on GitHub, follow the following steps:
Firstly, create an acount of Github, and a new repository on the GitHub website. You can choose public or private for your visibility setting
Next, open up RStudio and go to File > New Project… > Version Control, and click on the “Git” option
Fill in the URL of the new GitHub repository that you just created in the “Repository URL”“Project Directory Name” will auto-fill. Click on “Open in New Session” and then click on “Create Project”. A new project window will open up in RStudio containing your new project. You will notice that it will contain some files under the “Files” window including .gitignore, .Rproj and README.md. The last file was pulled-down from your GitHub repository
Next, to demonstrate how changes can be saved, you will create a new script file and add some code. This will then be saved locally. Following that, you will “push” my changes to GitHub so that my changes are also saved remotely. To do this, go to File > New File > R Script in RStudio. Write an R script and save it
This has saved your work to your computer, but not to GitHub. For saving to GitHub, go to the “Git” tab in the upper right pane. Check the “Staged” box for any files whose existence or modifications you want to commit
Click on “Commit” and a new dialogue box will open. Under “Commit Message”, add a brief description of the changes that you have made
Click on “Commit”. A Git Commit dialogue box will be displayed showing that the files are committed to GitHub. You may close this second window by clicking “Close”
Complete the final step by pressing Push. This will upload the R files to GitHub. You will see a dialogue box come up confirming this in the form of a string followed by HEAD -> main
If you have an existing project in RStudio and decide later that maintaining version control in GitHub would be a good idea, follow the steps:
# git remote add origin https://github.com/yourusername/yourrepo
# git pull origin main
# git push -u origin main
Using R understand language to tell R what and how do the thing that you want. We refer this discription to “script”.
A script is simply a text file that contains a set of commands and comments. It can be saved and reused later. It can also be edited so you can execute a modified version of the commands.
To create a new script in RStudio, you can open a new empty script by clicking the New File–>New File Menu–>R Script. The script editor opens with an empty script, which is ready for text entry. Here is an example to familiarize you with the Script Editor interface.
You can save your script by clicking on the Save icon at the top of the Script Editor panel. When you do that, a Save File dialog will open.
The default script name is Untitled.R. The Untitled part is highlighted. You will save this script as First script.R. Start typing First script. RStudio overwrites the highlighted default name with your new name, leaving the .R file extension.
Notice that RStudio will save your script to your current working folder. Press the Save button and your script is saved to your working folder. Notice that the name in the file at the top of the Script Editor panel now shows your saved script file name.
While it is not necessary to use an .R file extension for your R scripts, it does make it easier for RStudio to work with them if your use this file extension.
Click on the Open an existing file icon in the RStudio toolbar. A choose file dialog will open. Select the R script you want to open and click the Open button. Your script will open in the Script Editor panel with the script name in an editor tab.
In scripts, it can be very useful to save a bit of text which is not to be evaluated by R. You can leave a note to yourself about what the next line is supposed to do, what its strengths and limitations are, or anything else you want to remember later. To leave a note, we use “comments”, which are a line of text that starts with the hash symbol #. Anything on a line after a # will be ignored by R.
# This is a comment. Running this in R will have no effect.
The Run button in the Script Editor panel toolbar will run either the current line of code or any block of selected code. You can use your First script.R code to gain familiarity with this functionality.
Place the cursor anywhere in line 3 of your script [x = 34]. Now press the Run button in the Script Editor panel toolbar. Three things happen: 1) the code is transferred to the command console, 2) the code is executed, and 3) the cursor moves to the next line in your script. Press the Run button three more times. RStudio executes lines 4, 5, and 6 of your script.
Learn how to construct an RMarkdown file, please visit the site.
In RStudio, if you want to create a object called x and give it a value of 4 using the symbol “<-”, we would write:
x <- 4
x
[1] 4
The middle “<-” tells R to assign the value on the right to the object on the left. After running the command above, when running x in a command, it would be replaced by its value 4. If adding 3 to x, the expect would get 7.
x + 3
[1] 7
Object in R can store more than just simple numbers. It can store lists of numbers, functions, graphics, etc., depending on what values get assigned to the object.
You can always reassign a new value to a object. If telling R that x is equal to 32:
x <- 32
then x takes its new value:
x
[1] 32
Naming objects and functions in R is pretty flexible. A name has to start with a letter, but that can be followed by letters or numbers. Names in R are case-sensitive, which means that Weights and weights are completely different things to R. A good idea is to give object names be as descriptive as possible, so that you will know what you meant later on when looking at it. Sometimes clear naming means that it is best to have multiple words in the name, but you can’t have spaces. A common approach is to chain the words with underscores, as in weights_before_hospital.
An operator is a symbol that tells the compiler to perform specific mathematical or logical manipulations. R language is rich in built-in operators and provides following types of operators.