Let’s start at the very beginning

# Let's start at the very beginning

## A very good place to start

### Jen Richmond &#183; R-Ladies Sydney

#### [https://jennyslides.netlify.app/byod/](https://jennyslides.netlify.app/byod/)

---
layout: true

<div class="my-footer"><span>https://jennyslides.netlify.app/byod/</span></div>

---

# The Plan

1. Why R?

2. Navigating RStudio

3. Getting your data in

4. Cleaning it up a little

5. Maybe....probably not... summary stats/plotting

<br>

> You will need a laptop and access to wifi, but we will use RStudio Cloud, so don't worry about installation.

]

Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from R-Ladies Santa Barbara

]

---

# Disclaimer

#### Today we are really unlikely to know the answers to all your questions, but we can help you google!

#### Google is every R user's best friend, even Hadley Wickham.

]

]

---

Lets introduce ourselves, why do you want to learn R?

---

# Why I R...

### [Reproducibility](https://www.nature.com/articles/s41559-017-0160)

### [One stop shop](https://twitter.com/kara_woo/status/1100908125396193281)

### [Equity](https://rfortherestofus.com/2019/07/equity/)

### [Community](https://qz.com/work/1661486/r-ladies-made-data-science-inclusive/)

### [Fun!](https://headrush.typepad.com/creating_passionate_users/2005/06/kicking_ass_is_.html)

]

Image credit Darren Dahly @statsepi

]

---

# Warning: You will want to quit

---

Image credit Dani Navarro [Workflow](https://slides.com/djnavarro/workflow#/)

---

Image credit Dani Navarro [Workflow](https://slides.com/djnavarro/workflow#/)

---

Image credit Dani Navarro [Workflow](https://slides.com/djnavarro/workflow#/)

---

# Navigating RStudio

Link to RStudio Cloud Workspace

https://rstudio.cloud/project/637830

---

# Your turn 1: Set up RStudio Cloud

- Go to this link  https://rstudio.cloud/project/637830

- Sign up for an RStudio Cloud account

- Wait for project to chug

- Click Save a Permanent Copy

<br>

]

---

# Think of R-Studio as your kitchen

Image credit to Jessica Ward from @RLadiesNCL

---
class: inverse

# Your turn 2: Let's start by redecorating

From the top menu choose
- Tools
- Global Options
- Appearance

<br>

Then try out a few Editor themes and choose one you like. 
]

]

---

# What questions do you have ?

---

# Packages

A package is a group functions that some kind person has written, tested, bundled together and given away for everyone to use.

<br>

To install packages, type into your console. Just do this once.

<br>

To use a package, load it using the `library()` function. You need to do this every time you want to use the functions in a package.

]

```
install.packages("packagename")

```

<br>

```
library(packagename)

```

]

---

# What is the tidyverse?

The tidyverse is a mega-package (i.e. a package of other packages) designed to make data wrangling and visualisation easier.

```r
library(tidyverse)
```

is the same as...

```r
library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(purrr)
library(tibble)
library(stringr)
library(forcats)
```

]

]

It's a good idea to put `library(tidyverse)` at the top of every analysis document. You will probably need it. 
---

# Your turn 3: Install some packages

Step 1. In your console, install the following packages...

- janitor
- here
- beepr
- skimr

```r
install.packages("packagename")
```

Step 2. Then open the Rmd_demo_babynames.Rmd file and find the grey "chunk" that has  `library(tidyverse)` in it. Add these below to load the following packages.

```
library(janitor) 
library(here)
library(beepr)
library(skimr)

```
Run the code in the chunk using the little green arrow on the right

---

# What questions do you have ?

---

# An aside about RMarkdown

RMarkdown documents (or .Rmd files) are key to reproducible analysis. 
- notes/explanations + "chunks" of R code 
- "knit" the document (notes + code + output) into a nice format that other people (who may not know R) can access.

### Tricks and tips

- Insert a chunk using shortcut Alt-Cmd-I
- Hash to get levels of heading
- Asterisks to get bold and italics
]

Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from R-Ladies Santa Barbara
]

---

# Your turn 4: "Knit" your RMarkdown document

Change a few things about the text in the demo.Rmd doc

- Change the title at the top
- Add a few headings using #
- Make words bold by putting them in double asterik

Then choose "Knit" and check out the output. 
]

]

<br>

---
class: center

# What questions do you have ?

---

# Getting data in

---

# Where is the data?

]

The `here` package will help avoid file path drama. You can use it to tell R where you data is, relative to the top level of your project folder.

We can use the `read_csv()` from `readr` and `here()` to read a .csv file.

Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from @RLadiesSB

]

<br>

#### Template code

```
my_data_in_r <- read_csv(here("data_folder", "filename.csv"))

```

---

# Your turn 5: Read in the babynames data

- Step 1: Make a new chunk (Cmd-Option-I)

- Step 2: Copy the template code

```
my_data_in_r <- read_csv(here("data_folder", "filename.csv"))

```

- Step 3: Adapt the template to read in the SAbabynames.csv data

- Step 4: Play with the code in each chunk to plot...

- How popular is your name?
   - How popular is your name relative to your neighbours?
   
- Step 5: Knit your document (** use the cog to change the display option to Show Code and Output)
   
<div class="countdown" id="timer_61b2b545" style="right:0;bottom:0;" data-warnwhen="0">
<code class="countdown-time"><span class="countdown-digits minutes">10</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>

---
class: center, middle

# What questions do you have ?

---

# Before we break... a key idea

Image credit Dani Navarro [Tidyverse for Beginners](https://slides.com/djnavarro/tidyverse-for-beginners#/)

---

# Time for a break...

### Next up.... your own data

---

# Your turn 6: Lets get YOUR data in

<br>

- Step 1: Upload your data into the data folder

- Step 2: Make a new RMarkdown document (File-NewFile-RMarkdown), save as my_analysis.Rmd

- Step 3: Make a new chunk (Cmd-Option-I) and load packages using library()

- Step 4: Make another new chunk, copy the template code

```
my_data_in_R <- read_csv(here("data_folder", "filename.csv"))

```

- Step 5: Adapt the template to match your datafile

- Step 6: Run the chunk to read your data in 
   
- Step 7: Have a look at your data, any obvious problems?

---

# YAY your data is in R!

## But its probably "dirty".

#### Maybe...

#### ... the variable names are weird and inconsistent
#### ... you regret your choice of variable names
#### ... you don't need all of those variables

]

Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from R-Ladies Santa Barbara

]

---

## clean_names() from `janitor`

### Problem 1: all of the variable names are weird and inconsistent

```
better_data_in_R <- your_data_in_R %>%
               clean_names()
```

---

## rename() from `dplyr`

### Problem 2: you regret your variable names

```
betterdata <- yourdata %>%
  rename(newname1 = oldname1, newname2 = oldname2)
  
```
<br>

<br>

> NOTE: I can never remember the order of the names in this function. 
Is it old = new? or new = old? 
I have resorted to a stupid mnemonic... it happens to be alphabetical (i.e. L-M-New = Old)

---

## select() from `dplyr`

### Problem 3: you don't need all of those variables

Keep just the columns you name

``` 
better_data_in_R <- your_data_in_R %>%
  select(age, gender, score) 
```
Keep columns 1 thru 5, and 8 thru 12

```
better_data_in_R <- your_data_in_R %>%
  select(1:5, 8:12)

```

Use - to drop named variable, keeping everything else

```
better_data_in_R <- your_data_in_R %>%
  select(-date)

```
---

# Your turn 7: Clean up your own data

Use clean_names() if your variable names are weird

```
better_data_in_R <- your_data_in_R %>%
               clean_names()
```

Use select() to pare down to just the variables you need

``` 
better_data_in_R <- your_data_in_R %>%
  select(age, gender, score)

```
Use rename() to give your variables more useful names

```
better_data_in_R <- your_data_in_R %>%
  rename(newname1 = oldname1)
  
```

---

# What questions do you have ?

---

# Well done! Your data is in and a little cleaner.

### I think we are going to get to here, the rest of these slides cover...

1. how to make your data tidy (i.e. convert it from wide format into long format)
2. how to use summary(), skim(), and group_by() + summarise() to get summary stats
3. the basics of ggplot

]

]

---

# Recommended learning resources

### RYouWithMe (online course)

https://rladiessydney.org/courses/ryouwithme/

### R for Data Science (free e-book)

https://r4ds.had.co.nz/

### RStudio Education site (lots of great resources)

https://education.rstudio.com/

---

# Until next time....

---

# Keep working through these slides to learn about

### pivot_longer() and pivot_wider()
### summary(), skim(), group_by() %>% summarise()
### ggplot()

---

##  Your data is clean-er, but is it "tidy"?

## Three rules for tidy data:

- Each variable must have its own column.
- Each observation must have its own row.
- Each value must have its own cell.

Image credit: [R4DS](https://r4ds.had.co.nz/tidy-data.html)

Read more about tidy data [Wickham (2014)](https://www.jstatsoft.org/article/view/v059i10)

---

# Wide to Long

---

# pivot_longer() from `tidyr`

### pivot_longer wants to know 3 things
1. names_to = what you want to call the new column containing the names of the columns 
2. values_to = what you want to call the new column containing the values 
3. the range of columns that you want to make long (col1:col6)
 
<br>

```
longdata <- wide %>% 
            pivot_longer(names_to = "new_names_col", 
                    values_to = "new_values_col", col1:col6)
```
---

# My turn: Make favourite things long

---

# Your turn 8: make your wide data long

Take a look at your data and work out ...

1. which columns need to become long
2. what the values column should be called
3. what the "names" column should be called

<br>

```
longdata <- wide %>% 
      pivot_longer(names_to = "new_names_col", 
                  values_to = "new_values_col", col1:col6)
```

---

# Before we break... reminding you about a key idea

### The pipe %>% allows you to string together a series of functions and accomplish a LOT in just a few lines of code

```
ready_to_plot <- raw_data %>%
  select(Just, The, Columns, You, Want) %>%
  rename(just = Just, the = The, columns = Columns, you = You, want = Want) %>%
  pivot_longer(names_to = "condition", values = "response", columns:want)
  
```
---

# Time for a break...

### Next up.... summary stats & plotting

---

# Get a quick summary of...

## ... the structure of your data

```
head(yourdata)

str(yourdata)

length(yourdata)

glimpse(yourdata)

names(yourdata)

```
---

# Get some quick summary stats

There are lots of ways to get summary stats in R.

#### base R

```
summary(yourdata)

```
#### `skimr` package

```
skim(yourdata)

```

]

#### `dplyr` package

```
summary_stats <- yourdata %>%
  group_by(something_interesting) %>%
  summarise(mean = mean(resp), 
            stdev = sd(resp), 
            n = n(), 
            se = stdev/sqrt(n))
            
```
]

---

# My turn: summarise my favourite things

---

# Your turn 9: Get some summary stats from your data

Try out summary() or skim(), or create a dataframe using this code template

```
summary_stats <- yourdata %>%
  group_by(something_interesting) %>%
  summarise(mean = mean(resp), 
            stdev = sd(resp), 
            n = n(), 
            se = stdev/sqrt(n))
            
```

> Heads Up: do you need na.rm = TRUE?

---

# What questions do you have ?

---

# We made it! Lets make a ggplot!

---

# The grammar of graphics

### ggplot wants to know 3 things
1. what data you want to plot
2. the "aesthetics" i.e. which variables you want on the x and y, what things look like (colour/shape/fill)
3. which "geom" you want to use (point, line, col, histogram, violin)

```
ggplot(data, aes(x = __, y = __)) +
  geom_point()
```
#### WATCH OUT! You can pipe data into ggplot (and combine with dplyr functions) but within ggplot you need to ADD LAYERS with +

```
data %>%
  filter(interesting_variable > z) %>%
  ggplot(aes(x = __, y = __)) +
  geom_point() 
  
  
```
---

# My turn: plot favourite things

---

# Your turn 10: Plot your data

```
data %>%
  filter(interesting_variable > z) %>%
  ggplot(aes(x = __, y = __, colour = condition)) +
  geom_point() +
  facet_wrap(~ group)
  
  ggsave("your_first_ggplot.png")
  
```

]

Image credit: [Allison Horst](https://github.com/allisonhorst/stats-illustrations) @allison_horst from R-Ladies Santa Barbara

]

---

# Google: How to I add/get rid of  ________ ggplot in R?

### 1. error bars
### 2. axis labels
### 3. a title
### 4. legend
### 5. colour ....

]

]

Or check out the [R Graph Gallery](https://www.r-graph-gallery.com/index.html) for inspiration and code

---

# The End....

---