Read in the data and use functions of your choice to preview it.
library(tidyverse)
crash <- read_csv("https://sisbid.github.io/Data-Wrangling/labs/crashes.csv")
road <- read_csv("https://sisbid.github.io/Data-Wrangling/labs/roads.csv")
head(crash)
# A tibble: 6 × 4
Year Road N_Crashes Volume
<dbl> <chr> <dbl> <dbl>
1 1991 Interstate 65 25 40000
2 1992 Interstate 65 37 41000
3 1993 Interstate 65 45 45000
4 1994 Interstate 65 46 45600
5 1995 Interstate 65 46 49000
6 1996 Interstate 65 59 51000
head(road)
# A tibble: 5 × 3
Road District Length
<chr> <chr> <dbl>
1 Interstate 65 Greenfield 262
2 Interstate 70 Vincennes 156
3 US-36 Crawfordsville 139
4 US-40 Greenfield 150
5 US-52 Crawfordsville 172
by
argument, then merge using
by = "Road"
. call the output merged.
How many
observations are there?merged <- inner_join(crash, road)
merged <- inner_join(crash, road, by = "Road")
nrow(merged)
[1] 88
full_join.
Call the output
full.
How many observations are there?full <- full_join(crash, road)
nrow(full)
[1] 111
road
and crash
.
ORDER matters here! How many observations are there?left <- left_join(road, crash)
nrow(left)
[1] 89
right_join
with the same order of
the arguments. How many observations are there?right <- right_join(road, crash)
nrow(right)
[1] 110
road
data is missing from crash
?roads1 <- road %>% pull(Road)
roads2 <- crash %>% pull(Road)
setdiff(roads1, roads2) # This value is in `road` but not `crash`
[1] "US-52"
# Could also search for NAs created by the join
full %>% filter(is.na(N_Crashes))
# A tibble: 1 × 6
Year Road N_Crashes Volume District Length
<dbl> <chr> <dbl> <dbl> <chr> <dbl>
1 NA US-52 NA NA Crawfordsville 172
anti_join(road, crash)
# A tibble: 1 × 3
Road District Length
<chr> <chr> <dbl>
1 US-52 Crawfordsville 172
crash
data is missing from `road``?roads1 <- road %>% pull(Road)
roads2 <- crash %>% pull(Road)
setdiff(roads2, roads1) # These values are in `crash` but not `road`
[1] "Interstate 275"
# Could also search for NAs created by the join. Would be good to summarize with `count`
full %>% filter(is.na(District)) %>% count(Road)
# A tibble: 1 × 2
Road n
<chr> <int>
1 Interstate 275 22
anti_join(crash, road)
# A tibble: 22 × 4
Year Road N_Crashes Volume
<dbl> <chr> <dbl> <dbl>
1 1991 Interstate 275 27 20350
2 1992 Interstate 275 26 21200
3 1993 Interstate 275 22 23200
4 1994 Interstate 275 21 21200
5 1995 Interstate 275 28 23200
6 1996 Interstate 275 22 20000
7 1997 Interstate 275 27 18000
8 1998 Interstate 275 21 19500
9 1999 Interstate 275 22 21000
10 2000 Interstate 275 29 20700
# ℹ 12 more rows