Read in the data and use functions of your choice to preview it.
library(tidyverse)
crash <- read_csv("https://sisbid.github.io/Data-Wrangling/labs/crashes.csv")
road <- read_csv("https://sisbid.github.io/Data-Wrangling/labs/roads.csv")
head(crash)
## # A tibble: 6 × 4
## Year Road N_Crashes Volume
## <dbl> <chr> <dbl> <dbl>
## 1 1991 Interstate 65 25 40000
## 2 1992 Interstate 65 37 41000
## 3 1993 Interstate 65 45 45000
## 4 1994 Interstate 65 46 45600
## 5 1995 Interstate 65 46 49000
## 6 1996 Interstate 65 59 51000
head(road)
## # A tibble: 5 × 3
## Road District Length
## <chr> <chr> <dbl>
## 1 Interstate 65 Greenfield 262
## 2 Interstate 70 Vincennes 156
## 3 US-36 Crawfordsville 139
## 4 US-40 Greenfield 150
## 5 US-52 Crawfordsville 172
by
argument, then merge using
by = "Road"
. call the output merged.
How many
observations are there?merged <- inner_join(crash, road)
## Joining with `by = join_by(Road)`
merged <- inner_join(crash, road, by = "Road")
nrow(merged)
## [1] 88
full_join.
Call the output
full.
How many observations are there?full <- full_join(crash, road)
## Joining with `by = join_by(Road)`
nrow(full)
## [1] 111
road
and crash
.
ORDER matters here! How many observations are there?left <- left_join(road, crash)
## Joining with `by = join_by(Road)`
nrow(left)
## [1] 89
right_join
with the same order of
the arguments. How many observations are there?right <- right_join(road, crash)
## Joining with `by = join_by(Road)`
nrow(right)
## [1] 110
road
data is missing from crash
?roads1 <- road %>% pull(Road)
roads2 <- crash %>% pull(Road)
setdiff(roads1, roads2) # This value is in `road` but not `crash`
## [1] "US-52"
# Could also search for NAs created by the join
full %>% filter(is.na(N_Crashes))
## # A tibble: 1 × 6
## Year Road N_Crashes Volume District Length
## <dbl> <chr> <dbl> <dbl> <chr> <dbl>
## 1 NA US-52 NA NA Crawfordsville 172
anti_join(road, crash)
## Joining with `by = join_by(Road)`
## # A tibble: 1 × 3
## Road District Length
## <chr> <chr> <dbl>
## 1 US-52 Crawfordsville 172
crash
data is missing from `road``?roads1 <- road %>% pull(Road)
roads2 <- crash %>% pull(Road)
setdiff(roads2, roads1) # These values are in `crash` but not `road`
## [1] "Interstate 275"
# Could also search for NAs created by the join. Would be good to summarize with `count`
full %>% filter(is.na(District)) %>% count(Road)
## # A tibble: 1 × 2
## Road n
## <chr> <int>
## 1 Interstate 275 22
anti_join(crash, road)
## Joining with `by = join_by(Road)`
## # A tibble: 22 × 4
## Year Road N_Crashes Volume
## <dbl> <chr> <dbl> <dbl>
## 1 1991 Interstate 275 27 20350
## 2 1992 Interstate 275 26 21200
## 3 1993 Interstate 275 22 23200
## 4 1994 Interstate 275 21 21200
## 5 1995 Interstate 275 28 23200
## 6 1996 Interstate 275 22 20000
## 7 1997 Interstate 275 27 18000
## 8 1998 Interstate 275 21 19500
## 9 1999 Interstate 275 22 21000
## 10 2000 Interstate 275 29 20700
## # ℹ 12 more rows