library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ufo
. You can read directly from the web here:
https://raw.githubusercontent.com/SISBID/Module1/gh-pages/data/ufo/ufo_data_complete.csv
. You can ignore the “problems” with some rows.library(readr)
ufo <- read_csv("https://raw.githubusercontent.com/SISBID/Module1/gh-pages/data/ufo/ufo_data_complete.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 88875 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): datetime, city, state, country, shape, duration (hours/min), comme...
## dbl (1): duration (seconds)
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Clean up the column/variable names of the ufo
dataset to
remove spaces and non-alphanumeric characters. You can use the
dplyr::rename()
function or look into the
janitor::clean_names()
function.
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
ufo <- clean_names(ufo)
duration (hours/min)
originally-named column? (hint: use str_detect
and
filter
, you can ignore observations like “1/2 hour” and
similar ones that don’t contain some version of the word
“minutes”).ufo %>% filter(str_detect(duration_hours_min, "min"))%>% nrow()
## [1] 50112
duration (seconds)
column? I.e. how many of the above minutes-scale observations have
durations greater than 14400 seconds (or 4 hours)?sub <- ufo %>% filter(str_detect(duration_hours_min, "min"),
duration_seconds >14400)
nrow(sub)
## [1] 10
stringr
uses vectors or variables)ufo %>% pull(city) %>% str_subset("port$") %>% length() # C
## [1] 497