library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.3.0      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
  1. Read in the UFO dataset (used in the Data IO lectures) as an R object called ufo. You can read directly from the web here: https://raw.githubusercontent.com/SISBID/Module1/gh-pages/data/ufo/ufo_data_complete.csv . You can ignore the “problems” with some rows.
library(readr)
ufo <- read_csv("https://raw.githubusercontent.com/SISBID/Module1/gh-pages/data/ufo/ufo_data_complete.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 88875 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): datetime, city, state, country, shape, duration (hours/min), comme...
## dbl  (1): duration (seconds)
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean up the column/variable names of the ufo dataset to remove spaces and non-alphanumeric characters. You can use the dplyr::rename() function or look into the janitor::clean_names() function.

library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
ufo <- clean_names(ufo)
  1. How many UFO sightings were reported on a time scale of minutes, specifically using the duration (hours/min) originally-named column? (hint: use str_detect and filter, you can ignore observations like “1/2 hour” and similar ones that don’t contain some version of the word “minutes”).
ufo %>% filter(str_detect(duration_hours_min, "min"))%>% nrow()
## [1] 50112
  1. How accurate is the (formerly-named) duration (seconds) column? I.e. how many of the above minutes-scale observations have durations less than 60 seconds or greater than 3600 seconds?
sub <- ufo %>% filter(str_detect(duration_hours_min, "min"),
                      duration_seconds<60 | duration_seconds >3600)
nrow(sub)
## [1] 449
  1. How many ufo sighting cities end in (a) “field”, (b) “ton” and (c) “port”? (hint - remember stringr uses vectors or variables)
ufo %>% pull(city) %>% str_subset("field$") %>% length() # A
## [1] 979
ufo %>% pull(city) %>% str_subset("ton$") %>% length() # B
## [1] 4245
ufo %>% pull(city) %>% str_subset("port$") %>% length() # C
## [1] 497