Why R?
Navigating RStudio
Getting your data in
Cleaning it up a little
Maybe....probably not... summary stats/plotting
You will need a laptop and access to wifi, but we will use RStudio Cloud, so don't worry about installation.
Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara
]
Lets introduce ourselves, why do you want to learn R?
Image credit Darren Dahly @statsepi
Link to RStudio Cloud Workspace
https://rstudio.cloud/project/637830
Go to this link https://rstudio.cloud/project/637830
Sign up for an RStudio Cloud account
Wait for project to chug
Click Save a Permanent Copy
03:00
Image credit to Jessica Ward from @RLadiesNCL
From the top menu choose
Then try out a few Editor themes and choose one you like.
04:00
A package is a group functions that some kind person has written, tested, bundled together and given away for everyone to use.
To install packages, type into your console. Just do this once.
To use a package, load it using the library()
function. You need to do this every time you want to use the functions in a package.
install.packages("packagename")
library(packagename)
The tidyverse is a mega-package (i.e. a package of other packages) designed to make data wrangling and visualisation easier.
library(tidyverse)
is the same as...
library(readr)library(dplyr)library(tidyr)library(ggplot2)library(purrr)library(tibble)library(stringr)library(forcats)
It's a good idea to put library(tidyverse)
at the top of every analysis document. You will probably need it.
Step 1. In your console, install the following packages...
install.packages("packagename")
Step 2. Then open the Rmd_demo_babynames.Rmd file and find the grey "chunk" that has library(tidyverse)
in it. Add these below to load the following packages.
library(janitor) library(here)library(beepr)library(skimr)
Run the code in the chunk using the little green arrow on the right
04:00
RMarkdown documents (or .Rmd files) are key to reproducible analysis.
Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara
Change a few things about the text in the demo.Rmd doc
Then choose "Knit" and check out the output.
02:00
The here
package will help avoid file path drama. You can use it to tell R where you data is, relative to the top level of your project folder.
We can use the read_csv()
from readr
and here()
to read a .csv file.
Image credit: Allison Horst @allison_horst from @RLadiesSB
my_data_in_r <- read_csv(here("data_folder", "filename.csv"))
Step 1: Make a new chunk (Cmd-Option-I)
Step 2: Copy the template code
my_data_in_r <- read_csv(here("data_folder", "filename.csv"))
Step 3: Adapt the template to read in the SAbabynames.csv data
Step 4: Play with the code in each chunk to plot...
Step 5: Knit your document (** use the cog to change the display option to Show Code and Output)
10:00
Image credit Dani Navarro Tidyverse for Beginners
05:00
Step 1: Upload your data into the data folder
Step 2: Make a new RMarkdown document (File-NewFile-RMarkdown), save as my_analysis.Rmd
Step 3: Make a new chunk (Cmd-Option-I) and load packages using library()
Step 4: Make another new chunk, copy the template code
my_data_in_R <- read_csv(here("data_folder", "filename.csv"))
Step 5: Adapt the template to match your datafile
Step 6: Run the chunk to read your data in
Step 7: Have a look at your data, any obvious problems?
05:00
Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara
janitor
better_data_in_R <- your_data_in_R %>% clean_names()
dplyr
betterdata <- yourdata %>% rename(newname1 = oldname1, newname2 = oldname2)
NOTE: I can never remember the order of the names in this function. Is it old = new? or new = old? I have resorted to a stupid mnemonic... it happens to be alphabetical (i.e. L-M-New = Old)
dplyr
Keep just the columns you name
better_data_in_R <- your_data_in_R %>% select(age, gender, score)
Keep columns 1 thru 5, and 8 thru 12
better_data_in_R <- your_data_in_R %>% select(1:5, 8:12)
Use - to drop named variable, keeping everything else
better_data_in_R <- your_data_in_R %>% select(-date)
Use clean_names() if your variable names are weird
better_data_in_R <- your_data_in_R %>% clean_names()
Use select() to pare down to just the variables you need
better_data_in_R <- your_data_in_R %>% select(age, gender, score)
Use rename() to give your variables more useful names
better_data_in_R <- your_data_in_R %>% rename(newname1 = oldname1)
15:00
https://rladiessydney.org/courses/ryouwithme/
https://education.rstudio.com/
Image credit: R4DS
Read more about tidy data Wickham (2014)
tidyr
longdata <- wide %>% pivot_longer(names_to = "new_names_col", values_to = "new_values_col", col1:col6)
Take a look at your data and work out ...
longdata <- wide %>% pivot_longer(names_to = "new_names_col", values_to = "new_values_col", col1:col6)
15:00
ready_to_plot <- raw_data %>% select(Just, The, Columns, You, Want) %>% rename(just = Just, the = The, columns = Columns, you = You, want = Want) %>% pivot_longer(names_to = "condition", values = "response", columns:want)
10:00
head(yourdata)str(yourdata)length(yourdata)glimpse(yourdata)names(yourdata)
There are lots of ways to get summary stats in R.
summary(yourdata)
skimr
packageskim(yourdata)
dplyr
packagesummary_stats <- yourdata %>% group_by(something_interesting) %>% summarise(mean = mean(resp), stdev = sd(resp), n = n(), se = stdev/sqrt(n))
Try out summary() or skim(), or create a dataframe using this code template
summary_stats <- yourdata %>% group_by(something_interesting) %>% summarise(mean = mean(resp), stdev = sd(resp), n = n(), se = stdev/sqrt(n))
Heads Up: do you need na.rm = TRUE?
10:00
ggplot(data, aes(x = __, y = __)) + geom_point()
data %>% filter(interesting_variable > z) %>% ggplot(aes(x = __, y = __)) + geom_point()
ggplot(data, aes(x = __, y = __)) + geom_point()
data %>% filter(interesting_variable > z) %>% ggplot(aes(x = __, y = __, colour = condition)) + geom_point() + facet_wrap(~ group) ggsave("your_first_ggplot.png")
Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara
10:00
Or check out the R Graph Gallery for inspiration and code
Why R?
Navigating RStudio
Getting your data in
Cleaning it up a little
Maybe....probably not... summary stats/plotting
You will need a laptop and access to wifi, but we will use RStudio Cloud, so don't worry about installation.
Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara
]
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |