Data used

Circulator Lanes Dataset: the data is from https://data.baltimorecity.gov/Transportation/Charm-City-Circulator-Ridership/wwvu-583r

Available on: https://sisbid.github.io/Data-Wrangling/data/Charm_City_Circulator_Ridership.csv

library(tidyverse)

circ <- read_csv("https://sisbid.github.io/Data-Wrangling/data/Charm_City_Circulator_Ridership.csv")
## Rows: 1146 Columns: 15
## ── Column specification ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): day, date
## dbl (13): orangeBoardings, orangeAlightings, orangeAverage, purpleBoardings,...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Each row is a different day. How many days are in the data set?
nrow(circ)
## [1] 1146
dim(circ)
## [1] 1146   15
circ %>% 
  nrow()
## [1] 1146
  1. What is the total (sum) number of boardings on the green bus (greenBoardings column)?
sum(circ$greenBoardings, na.rm = TRUE)
## [1] 935564
circ %>% pull(greenBoardings) %>% sum(na.rm = TRUE)
## [1] 935564
count(circ, wt = greenBoardings)
## # A tibble: 1 × 1
##        n
##    <dbl>
## 1 935564
  1. Group the data by day of the week (day). Find the mean daily ridership (daily column). (hint: use group_by and summarize functions)
circ %>% 
  group_by(day) %>% 
  summarize(mean = mean(daily, na.rm = TRUE))
## # A tibble: 7 × 2
##   day        mean
##   <chr>     <dbl>
## 1 Friday    8961.
## 2 Monday    7340.
## 3 Saturday  6743.
## 4 Sunday    4531.
## 5 Thursday  7639.
## 6 Tuesday   7642.
## 7 Wednesday 7779.
  1. Take the median of orangeBoardings(use median()), grouping by day of the week.
circ %>% 
  group_by(day) %>% 
  summarize(median = median(orangeBoardings, na.rm = TRUE))
## # A tibble: 7 × 2
##   day       median
##   <chr>      <dbl>
## 1 Friday     4014.
## 2 Monday     3336 
## 3 Saturday   2963 
## 4 Sunday     1900 
## 5 Thursday   3485 
## 6 Tuesday    3484 
## 7 Wednesday  3576

Practice on your own

  1. Group by day of the week, and then find the mean ridership summarize(across()) all numeric columns using where(is.numeric).
circ %>% 
  group_by(day) %>% 
  summarize(across(where(is.numeric), ~mean(.x, na.rm=T)))
## # A tibble: 7 × 14
##   day       orangeBoardings orangeAlightings orangeAverage purpleBoardings
##   <chr>               <dbl>            <dbl>         <dbl>           <dbl>
## 1 Friday              3744.            3840.         3745.           5172.
## 2 Monday              3076.            3160.         3088.           4158.
## 3 Saturday            2861.            2937.         2873.           3774.
## 4 Sunday              1878.            1916.         1887.           2545.
## 5 Thursday            3215.            3281.         3204.           4404.
## 6 Tuesday             3153.            3235.         3158.           4300.
## 7 Wednesday           3256.            3338.         3244.           4493.
## # ℹ 9 more variables: purpleAlightings <dbl>, purpleAverage <dbl>,
## #   greenBoardings <dbl>, greenAlightings <dbl>, greenAverage <dbl>,
## #   bannerBoardings <dbl>, bannerAlightings <dbl>, bannerAverage <dbl>,
## #   daily <dbl>