library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.3.0      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
  1. Read in the UFO dataset (used in the Data IO lectures) as an R object called ufo. You can read directly from the web here: https://raw.githubusercontent.com/SISBID/Module1/gh-pages/data/ufo/ufo_data_complete.csv . You can ignore the “problems” with some rows.
library(readr)
ufo <- read_csv("https://raw.githubusercontent.com/SISBID/Module1/gh-pages/data/ufo/ufo_data_complete.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 88875 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): datetime, city, state, country, shape, duration (hours/min), comme...
## dbl  (1): duration (seconds)
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Clean up the column/variable names of the ufo dataset to remove spaces and non-alphanumeric characters. You can use the dplyr::rename() function or look into the janitor::clean_names() function. save the data as ufo_clean.
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
ufo_clean <- clean_names(ufo)
  1. Filter for rows where state is “tx” or “nm”, “ut”. Then use recode to make an exact swap of “Texas” for “tx” and “New_Mexico” for “nm” and “Utah” for “ut” of the state variable. Save the output as South_West. hint- you will need mutate.
South_West <- ufo_clean %>% filter(state %in% c("tx", "nm", "ut")) %>%
  mutate(recode(state, "Texas" = "tx",
                        "New_Mexico" = "nm",
                        "Utah" = "ut"))
South_West
## # A tibble: 5,763 × 12
##    datetime    city  state country shape durat…¹ durat…² comme…³ date_…⁴ latit…⁵
##    <chr>       <chr> <chr> <chr>   <chr>   <dbl> <chr>   <chr>   <chr>   <chr>  
##  1 10/10/1949… san … tx    us      cyli…    2700 45 min… This e… 4/27/2… 29.883…
##  2 10/10/1949… lack… tx    <NA>    light    7200 1-2 hrs 1949 L… 12/16/… 29.384…
##  3 10/10/1956… edna  tx    us      circ…      20 1/2 ho… My old… 1/17/2… 28.978…
##  4 10/10/1977… san … tx    us      other      30 30 sec… i was … 2/24/2… 29.423…
##  5 10/10/1980… hous… tx    us      sphe…     180 3 min   Sphere… 4/16/2… 29.763…
##  6 10/10/1980… dall… tx    us      unkn…     300 5 minu… Strang… 10/28/… 32.783…
##  7 10/10/1984… hous… tx    us      circ…      60 1 minu… 2 expe… 4/18/2… 29.763…
##  8 10/10/1992… staf… tx    us      unkn…      10 10 sec… A man … 4/18/2… 29.615…
##  9 10/10/1992… weat… tx    us      unkn…      30 30 sec… Black … 9/2/20… 32.759…
## 10 10/10/1994… merc… tx    <NA>    cigar    3600 1 hour  ufo ch… 12/12/… 26.149…
## # … with 5,753 more rows, 2 more variables: longitude <chr>,
## #   `recode(state, Texas = "tx", New_Mexico = "nm", Utah = "ut")` <chr>, and
## #   abbreviated variable names ¹​duration_seconds, ²​duration_hours_min,
## #   ³​comments, ⁴​date_posted, ⁵​latitude
  1. Use case_when() to create a new variable called “continent”. If the country is “ca” or “us” make the value be “North America”, if it is “gb” or “de” make the value “Europe”, and if it is “au” make it “Australia”. No need to worry about the TRUE statement as we want to keep our other NA values.
ufo_clean %>%
  mutate(continent = case_when(country %in% c("ca", "us") ~ "North America",
                               country %in% c("gb", "de") ~ "Europe",
                               country == "au" ~ "Australia"))
## # A tibble: 88,875 × 12
##    datetime    city  state country shape durat…¹ durat…² comme…³ date_…⁴ latit…⁵
##    <chr>       <chr> <chr> <chr>   <chr>   <dbl> <chr>   <chr>   <chr>   <chr>  
##  1 10/10/1949… san … tx    us      cyli…    2700 45 min… This e… 4/27/2… 29.883…
##  2 10/10/1949… lack… tx    <NA>    light    7200 1-2 hrs 1949 L… 12/16/… 29.384…
##  3 10/10/1955… ches… <NA>  gb      circ…      20 20 sec… Green/… 1/21/2… 53.2   
##  4 10/10/1956… edna  tx    us      circ…      20 1/2 ho… My old… 1/17/2… 28.978…
##  5 10/10/1960… kane… hi    us      light     900 15 min… AS a M… 1/22/2… 21.418…
##  6 10/10/1961… bris… tn    us      sphe…     300 5 minu… My fat… 4/27/2… 36.595…
##  7 10/10/1965… pena… <NA>  gb      circ…     180 about … penart… 2/14/2… 51.434…
##  8 10/10/1965… norw… ct    us      disk     1200 20 min… A brig… 10/2/1… 41.117…
##  9 10/10/1966… pell… al    us      disk      180 3  min… Strobe… 3/19/2… 33.586…
## 10 10/10/1966… live… fl    us      disk      120 severa… Saucer… 5/11/2… 30.294…
## # … with 88,865 more rows, 2 more variables: longitude <chr>, continent <chr>,
## #   and abbreviated variable names ¹​duration_seconds, ²​duration_hours_min,
## #   ³​comments, ⁴​date_posted, ⁵​latitude