--- title: "Data Summarization Lab Key" output: html_document editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Data used Circulator Lanes Dataset: the data is from https://data.baltimorecity.gov/Transportation/Charm-City-Circulator-Ridership/wwvu-583r Available on: https://sisbid.github.io/Data-Wrangling/data/Charm_City_Circulator_Ridership.csv ```{r} library(tidyverse) circ <- read_csv("https://sisbid.github.io/Data-Wrangling/data/Charm_City_Circulator_Ridership.csv") ``` 1. How many days are in the data set? You can assume each observation/row is a different day (hint: get the number of rows). ```{r q1} nrow(circ) dim(circ) circ %>% nrow() ``` 2. What is the total (sum) number of boardings on the green bus (`greenBoardings` column)? ```{r q2} sum(circ$greenBoardings, na.rm = TRUE) circ %>% pull(greenBoardings) %>% sum(na.rm = TRUE) count(circ, wt = greenBoardings) ``` 3. How many days are missing daily ridership (`daily` column)? Use `is.na()` and `sum()`. ```{r q3} daily <- circ %>% pull(daily) sum(is.na(daily)) # Can also circ %>% count(is.na(daily)) ``` 4. Group the data by day of the week (`day`). Next, find the mean daily ridership (`daily` column) and the sample size. (hint: use `group_by` and `summarize` functions) ```{r q4} circ %>% group_by(day) %>% summarise(mean = mean(daily, na.rm = TRUE), n = n()) ``` ## **Extra practice:** 5. What is the median of `orangeBoardings`(use `median()`). ```{r q6} circ %>% summarise(median = median(orangeBoardings, na.rm = TRUE)) # OR circ %>% pull(orangeBoardings) %>% median(na.rm = TRUE) ``` 6. Take the median of `orangeBoardings`(use `median()`), but this time stratify by day of the week. ```{r q7} circ %>% group_by(day) %>% summarise(median = median(orangeBoardings, na.rm = TRUE)) ```