Data used

Bike Lanes Dataset: BikeBaltimore is the Department of Transportation’s bike program. The data is from http://data.baltimorecity.gov/Transportation/Bike-Lanes/xzfj-gyms

You can Download as a CSV in your current working directory. Note its also available at: https://sisbid.github.io/Data-Wrangling/labs/Bike_Lanes.csv

If you haven’t installed naniar yet, you will need to use: install.packages("naniar") first.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(naniar)

bike <-read_csv("https://sisbid.github.io/Data-Wrangling/labs/Bike_Lanes.csv")
## Rows: 1631 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): subType, name, block, type, project, route
## dbl (3): numLanes, length, dateInstalled
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Use the is.na() and any() functions to check if the bike dateInstalled variable has any NA values. Hint: You first need to pull out the vector version of this variable to use the is.na() function.
bike %>%
  pull(dateInstalled) %>%
  is.na() %>%
  any()
## [1] FALSE
  1. Filter rows of bike, so that only rows remain that do NOT have missing values for the route variable, using drop_na. Assign this to the object have_route.
have_rout <- bike %>% drop_na(route)
  1. Use naniar to make a visual of the amount of data missing for each variable of bike (use gg_miss_var()). Check out more about this package here: https://www.njtierney.com/post/2018/06/12/naniar-on-cran/
gg_miss_var(bike)

  1. What percentage of the subType variable is complete of bike ? Hint: use another naniar function.
pull(bike, subType) %>% pct_complete() # this
## [1] 99.75475
miss_var_summary(bike) # or this
## # A tibble: 9 × 3
##   variable      n_miss pct_miss
##   <chr>          <int>    <dbl>
## 1 route           1269   77.8  
## 2 block            215   13.2  
## 3 project           74    4.54 
## 4 name              12    0.736
## 5 type               9    0.552
## 6 subType            4    0.245
## 7 numLanes           0    0    
## 8 length             0    0    
## 9 dateInstalled      0    0