SIDS Ch 2 - Data Viz

tidyverse
R
Author

Colin Madland

Published

May 18, 2025

Ismay, C., Kim, A. Y., & Valdivia, A. (2025). Statistical Inference via Data Science: A ModernDive into R and the Tidyverse (2nd ed.). Chapman and Hall/CRC.

library(nycflights23)
library(ggplot2)
library(viridis)
Loading required package: viridisLite
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Wrangling data

  • filter() a data frame’s existing rows to only pick out a subset of them. For example, the alaska_flights data frame.
  • summarize() one or more of its columns/variables with a summary statistic. Examples of summary statistics include the median and interquartile range of temperatures as we saw in Section 2.7 on boxplots.
  • group_by() its rows. In other words, assign different rows to be part of the same group. We can then combine group_by() with summarize() to report summary statistics for each group separately. For example, say you don’t want a single overall average departure delay dep_delay for all three origin airports combined, but rather three separate average departure delays, one computed for each of the three origin airports.
  • mutate() its existing columns/variables to create new ones. For example, convert hourly temperature readings from Fahrenheit to Celsius.
  • arrange() its rows. For example, sort the rows of weather in ascending or descending order of temp.
  • join() it with another data frame by matching along a “key” variable. In other words, merge these two data frames together.

Piper DOWN!! |>r DOWN!

  • the |> was introduced to R in 2021 to replace the clumsy %>%
  • allows users to create a sequence of operations within one set of instructions
  • Take x then
  • Use x as an input to a function f() then
  • Use the output of f(x) as an input to a function g() then
  • Use the output of g(f(x)) as an input to a function h()

One way is to nest parentheses…

h(g(f(x)))

Not hard to read as there are only 3 simple arguments, but this quickly gets out of hand with many complex arguments…so…

x |> 
  f() |> 
  g() |> 
  h()

The pipe takes the output of one function an ‘pipes’ it into the next function as an argument.