--- title: "Merging and Joining Lab Key" author: "Data Wrangling in R" output: html_document --- ## Part 1 Read in the data and use functions of your choice to preview it. ```{r, message = FALSE} library(tidyverse) crash <- read_csv("https://sisbid.github.io/Data-Wrangling/labs/crashes.csv") road <- read_csv("https://sisbid.github.io/Data-Wrangling/labs/roads.csv") head(crash) head(road) ``` 1. Join data to retain only complete data, (using an inner join) e.g. those observations with road lengths and districts. Merge without using `by` argument, then merge using `by = "Road"`. call the output `merged.` How many observations are there? ```{r} merged <- inner_join(crash, road) merged <- inner_join(crash, road, by = "Road") nrow(merged) ``` 2. Join data using a `full_join.` Call the output `full.` How many observations are there? ```{r} full <- full_join(crash, road) nrow(full) ``` 3. Do a left join of the `road` and `crash`. ORDER matters here! How many observations are there? ```{r} left <- left_join(road, crash) nrow(left) ``` 4. Repeat above with a `right_join` with the same order of the arguments. How many observations are there? ```{r} right <- right_join(road, crash) nrow(right) ``` 5. What `road` data is missing from `crash`? ```{r} roads1 <- road %>% pull(Road) roads2 <- crash %>% pull(Road) setdiff(roads1, roads2) # This value is in `road` but not `crash` # Could also search for NAs created by the join full %>% filter(is.na(N_Crashes)) anti_join(road, crash) ``` 5. What `crash` data is missing from `road``? ```{r} roads1 <- road %>% pull(Road) roads2 <- crash %>% pull(Road) setdiff(roads2, roads1) # These values are in `crash` but not `road` # Could also search for NAs created by the join. Would be good to summarize with `count` full %>% filter(is.na(District)) %>% count(Road) anti_join(crash, road) ```