+ - 0:00:00
Notes for current slide
Notes for next slide

Let's start at the very beginning

A very good place to start

Jen Richmond · R-Ladies Sydney

https://jennyslides.netlify.app/byod/

1

The Plan

  1. Why R?

  2. Navigating RStudio

  3. Getting your data in

  4. Cleaning it up a little

  5. Maybe....probably not... summary stats/plotting


You will need a laptop and access to wifi, but we will use RStudio Cloud, so don't worry about installation.

Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara

]

2

Disclaimer

Today we are really unlikely to know the answers to all your questions, but we can help you google!

Google is every R user's best friend, even Hadley Wickham.

3

1. Why R?

Lets introduce ourselves, why do you want to learn R?

4

Why I R...

Image credit Darren Dahly @statsepi

5

Warning: You will want to quit

6

Image credit Dani Navarro Workflow

7

Image credit Dani Navarro Workflow

8

Image credit Dani Navarro Workflow

9

Navigating RStudio

Link to RStudio Cloud Workspace

https://rstudio.cloud/project/637830

10

Your turn 1: Set up RStudio Cloud


03:00
11

Think of R-Studio as your kitchen

Image credit to Jessica Ward from @RLadiesNCL

12

Your turn 2: Let's start by redecorating

From the top menu choose

  • Tools
  • Global Options
  • Appearance


Then try out a few Editor themes and choose one you like.

04:00
13

What questions do you have ?

14

Packages

A package is a group functions that some kind person has written, tested, bundled together and given away for everyone to use.


To install packages, type into your console. Just do this once.


To use a package, load it using the library() function. You need to do this every time you want to use the functions in a package.

install.packages("packagename")


library(packagename)
15

What is the tidyverse?

The tidyverse is a mega-package (i.e. a package of other packages) designed to make data wrangling and visualisation easier.

library(tidyverse)

is the same as...

library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(purrr)
library(tibble)
library(stringr)
library(forcats)

It's a good idea to put library(tidyverse) at the top of every analysis document. You will probably need it.

16

Your turn 3: Install some packages

Step 1. In your console, install the following packages...

  • janitor
  • here
  • beepr
  • skimr
install.packages("packagename")

Step 2. Then open the Rmd_demo_babynames.Rmd file and find the grey "chunk" that has library(tidyverse) in it. Add these below to load the following packages.

library(janitor)
library(here)
library(beepr)
library(skimr)

Run the code in the chunk using the little green arrow on the right

04:00
17

What questions do you have ?



18

An aside about RMarkdown

RMarkdown documents (or .Rmd files) are key to reproducible analysis.

  • notes/explanations + "chunks" of R code
  • "knit" the document (notes + code + output) into a nice format that other people (who may not know R) can access.

Tricks and tips

  • Insert a chunk using shortcut Alt-Cmd-I
  • Hash to get levels of heading
  • Asterisks to get bold and italics

Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara

19

Your turn 4: "Knit" your RMarkdown document

Change a few things about the text in the demo.Rmd doc

  • Change the title at the top
  • Add a few headings using #
  • Make words bold by putting them in double asterik

Then choose "Knit" and check out the output.


02:00
20

What questions do you have ?

21

Getting data in

22

Where is the data?

The here package will help avoid file path drama. You can use it to tell R where you data is, relative to the top level of your project folder.

We can use the read_csv() from readr and here() to read a .csv file.

Image credit: Allison Horst @allison_horst from @RLadiesSB


Template code

my_data_in_r <- read_csv(here("data_folder", "filename.csv"))
23

Your turn 5: Read in the babynames data

  • Step 1: Make a new chunk (Cmd-Option-I)

  • Step 2: Copy the template code

my_data_in_r <- read_csv(here("data_folder", "filename.csv"))
  • Step 3: Adapt the template to read in the SAbabynames.csv data

  • Step 4: Play with the code in each chunk to plot...

    • How popular is your name?
    • How popular is your name relative to your neighbours?
  • Step 5: Knit your document (** use the cog to change the display option to Show Code and Output)

10:00
24

What questions do you have ?

25

Before we break... a key idea

Image credit Dani Navarro Tidyverse for Beginners

26

Time for a break...

Next up.... your own data

05:00
27

Your turn 6: Lets get YOUR data in


  • Step 1: Upload your data into the data folder

  • Step 2: Make a new RMarkdown document (File-NewFile-RMarkdown), save as my_analysis.Rmd

  • Step 3: Make a new chunk (Cmd-Option-I) and load packages using library()

  • Step 4: Make another new chunk, copy the template code

my_data_in_R <- read_csv(here("data_folder", "filename.csv"))
  • Step 5: Adapt the template to match your datafile

  • Step 6: Run the chunk to read your data in

  • Step 7: Have a look at your data, any obvious problems?

05:00
28

YAY your data is in R!

But its probably "dirty".

Maybe...

... the variable names are weird and inconsistent

... you regret your choice of variable names

... you don't need all of those variables

Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara

29

clean_names() from janitor

Problem 1: all of the variable names are weird and inconsistent

better_data_in_R <- your_data_in_R %>%
clean_names()

30

rename() from dplyr

Problem 2: you regret your variable names

betterdata <- yourdata %>%
rename(newname1 = oldname1, newname2 = oldname2)



NOTE: I can never remember the order of the names in this function. Is it old = new? or new = old? I have resorted to a stupid mnemonic... it happens to be alphabetical (i.e. L-M-New = Old)

31

select() from dplyr

Problem 3: you don't need all of those variables

Keep just the columns you name

better_data_in_R <- your_data_in_R %>%
select(age, gender, score)

Keep columns 1 thru 5, and 8 thru 12

better_data_in_R <- your_data_in_R %>%
select(1:5, 8:12)

Use - to drop named variable, keeping everything else

better_data_in_R <- your_data_in_R %>%
select(-date)
32

Your turn 7: Clean up your own data

Use clean_names() if your variable names are weird

better_data_in_R <- your_data_in_R %>%
clean_names()

Use select() to pare down to just the variables you need

better_data_in_R <- your_data_in_R %>%
select(age, gender, score)

Use rename() to give your variables more useful names

better_data_in_R <- your_data_in_R %>%
rename(newname1 = oldname1)
15:00
33

What questions do you have ?

34

Well done! Your data is in and a little cleaner.

I think we are going to get to here, the rest of these slides cover...

  1. how to make your data tidy (i.e. convert it from wide format into long format)
  2. how to use summary(), skim(), and group_by() + summarise() to get summary stats
  3. the basics of ggplot
35

Recommended learning resources

RYouWithMe (online course)

https://rladiessydney.org/courses/ryouwithme/

R for Data Science (free e-book)

https://r4ds.had.co.nz/

RStudio Education site (lots of great resources)

https://education.rstudio.com/

36

Until next time....

via GIPHY

37

Keep working through these slides to learn about

pivot_longer() and pivot_wider()

summary(), skim(), group_by() %>% summarise()

ggplot()

38

Your data is clean-er, but is it "tidy"?

Three rules for tidy data:

  • Each variable must have its own column.
  • Each observation must have its own row.
  • Each value must have its own cell.

Image credit: R4DS




Read more about tidy data Wickham (2014)

39

Wide to Long

40

pivot_longer() from tidyr

pivot_longer wants to know 3 things

  1. names_to = what you want to call the new column containing the names of the columns
  2. values_to = what you want to call the new column containing the values
  3. the range of columns that you want to make long (col1:col6)


longdata <- wide %>%
pivot_longer(names_to = "new_names_col",
values_to = "new_values_col", col1:col6)
41

My turn: Make favourite things long

42

Your turn 8: make your wide data long

Take a look at your data and work out ...

  1. which columns need to become long
  2. what the values column should be called
  3. what the "names" column should be called


longdata <- wide %>%
pivot_longer(names_to = "new_names_col",
values_to = "new_values_col", col1:col6)
15:00
43

Before we break... reminding you about a key idea

The pipe %>% allows you to string together a series of functions and accomplish a LOT in just a few lines of code

ready_to_plot <- raw_data %>%
select(Just, The, Columns, You, Want) %>%
rename(just = Just, the = The, columns = Columns, you = You, want = Want) %>%
pivot_longer(names_to = "condition", values = "response", columns:want)
44

Time for a break...

Next up.... summary stats & plotting

10:00
45

Get a quick summary of...

... the structure of your data

head(yourdata)
str(yourdata)
length(yourdata)
glimpse(yourdata)
names(yourdata)
46

Get some quick summary stats

There are lots of ways to get summary stats in R.

base R

summary(yourdata)

skimr package

skim(yourdata)

dplyr package

summary_stats <- yourdata %>%
group_by(something_interesting) %>%
summarise(mean = mean(resp),
stdev = sd(resp),
n = n(),
se = stdev/sqrt(n))
47

My turn: summarise my favourite things

48

Your turn 9: Get some summary stats from your data

Try out summary() or skim(), or create a dataframe using this code template

summary_stats <- yourdata %>%
group_by(something_interesting) %>%
summarise(mean = mean(resp),
stdev = sd(resp),
n = n(),
se = stdev/sqrt(n))

Heads Up: do you need na.rm = TRUE?

10:00
49

What questions do you have ?

50

We made it! Lets make a ggplot!

51

The grammar of graphics

ggplot wants to know 3 things

  1. what data you want to plot
  2. the "aesthetics" i.e. which variables you want on the x and y, what things look like (colour/shape/fill)
  3. which "geom" you want to use (point, line, col, histogram, violin)
ggplot(data, aes(x = __, y = __)) +
geom_point()

WATCH OUT! You can pipe data into ggplot (and combine with dplyr functions) but within ggplot you need to ADD LAYERS with +

data %>%
filter(interesting_variable > z) %>%
ggplot(aes(x = __, y = __)) +
geom_point()
52

My turn: plot favourite things

53

Your turn 10: Plot your data

ggplot(data, aes(x = __, y = __)) +
geom_point()
data %>%
filter(interesting_variable > z) %>%
ggplot(aes(x = __, y = __, colour = condition)) +
geom_point() +
facet_wrap(~ group)
ggsave("your_first_ggplot.png")

Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara

10:00
54

Google: How to I add/get rid of __ ggplot in R?

1. error bars

2. axis labels

3. a title

4. legend

5. colour ....

Or check out the R Graph Gallery for inspiration and code

55

The End....

via GIPHY

56
57

The Plan

  1. Why R?

  2. Navigating RStudio

  3. Getting your data in

  4. Cleaning it up a little

  5. Maybe....probably not... summary stats/plotting


You will need a laptop and access to wifi, but we will use RStudio Cloud, so don't worry about installation.

Image credit: Allison Horst @allison_horst from R-Ladies Santa Barbara

]

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow