Bike Lanes Dataset: BikeBaltimore is the Department of Transportation’s bike program. The data is from http://data.baltimorecity.gov/Transportation/Bike-Lanes/xzfj-gyms
You can Download as a CSV in your current working directory. Note its also available at: https://sisbid.github.io/Data-Wrangling/labs/Bike_Lanes.csv
If you haven’t installed naniar
yet, you will need to
use: install.packages("naniar")
first.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(naniar)
bike <-read_csv("https://sisbid.github.io/Data-Wrangling/labs/Bike_Lanes.csv")
## Rows: 1631 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): subType, name, block, type, project, route
## dbl (3): numLanes, length, dateInstalled
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
is.na()
and any()
functions to
check if the bike dateInstalled
variable has any
NA
values. Hint: You first need to pull
out
the vector version of this variable to use the is.na()
function.bike %>%
pull(dateInstalled) %>%
is.na() %>%
any()
## [1] FALSE
route
variable, using
drop_na
. Assign this to the object
have_route.
have_rout <- bike %>% drop_na(route)
naniar
to make a visual of the amount of data
missing for each variable of bike
(use
gg_miss_var()
). Check out more about this package here: https://www.njtierney.com/post/2018/06/12/naniar-on-cran/gg_miss_var(bike)
subType
variable is complete of
bike
? Hint: use another naniar
function.pull(bike, subType) %>% pct_complete() # this
## [1] 99.75475
miss_var_summary(bike) # or this
## # A tibble: 9 × 3
## variable n_miss pct_miss
## <chr> <int> <num>
## 1 route 1269 77.8
## 2 block 215 13.2
## 3 project 74 4.54
## 4 name 12 0.736
## 5 type 9 0.552
## 6 subType 4 0.245
## 7 numLanes 0 0
## 8 length 0 0
## 9 dateInstalled 0 0