discovr_05 - Visualizing Data

ggplot2
visualizing-data
R
discovr
Author

Colin Madland

Published

April 23, 2024

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
wish_tib <- here::here("data/jiminy_cricket.csv") |> readr::read_csv()
Rows: 500 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): strategy, time
dbl (2): id, success

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
notebook_tib <- here::here("data/notebook.csv") |> readr::read_csv()
Rows: 40 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): sex, film
dbl (1): arousal

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
exam_tib <- here::here("data/exam_anxiety.csv") |> readr::read_csv()
Rows: 103 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): sex
dbl (4): id, revise, exam_grade, anxiety

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
wish_tib <- wish_tib |>
  dplyr::mutate(
    strategy = forcats::as_factor(strategy),
    time = forcats::as_factor(time) |> forcats::fct_relevel("Baseline")
  )
notebook_tib <- notebook_tib |>
  dplyr::mutate(
    sex = forcats::as_factor(sex),
    film = forcats::as_factor(film)
  )
exam_tib <- exam_tib |>
  dplyr::mutate(
    id = forcats::as_factor(id),
    sex = forcats::as_factor(sex)
  )

ggplot2

  • part of the tidyverse
aes()
controls aesthetics of the plot

Geometric objects

  • objects that represent data
geom_point()
plots data by points/dots
geom_boxplot()
plots boxplots
geom_histogram()
plots histograms
geom_errorbar()
plots error bars
geom_smooth()
plots summary lines

Objects or ‘stats’

  • some situations where it is easier to display a summary of the data directly to the plot (usually stat_summary())

Scales

  • control details of how data are mapped to their visual objects to control what appears on x and y axes using scale_x_continuous() and scale_y_continuous(), axis labels are controlled with labs()

Coordinate system

  • ggplot2 uses a Cartesian system.
  • coord_cartesian() sets limits on x and y axes

Position adjustments

  • position_dodge()forces objects to not overlap side by side
  • position_jitter() adds small random adjustments to data points

Facets

  • used to plot different parts of the data in different panels

Themes

  • various themes to style the output
  • can be overridden with theme() function

Each of the above are layers that can be added to a plot, as below

Explanation of the layered approach to ggplot2

Boxplots (box-whisker plots)

  • imaginary data based on peoples’ level of success (0-100)
  • one group told to wish for good success, other group told to work hard for success
  • measured success again 5 years later
  • The data are in wish_tib. The variables are id (the person’s id), strategy (hard work or wishing upon a star), time (baseline or 5 years), and success (the rating on my dodgy scale).

Creating a boxplot…

geom_boxplot()
ggplot2::ggplot(my_tib, aes(variable_for_x_axis, variable_for_y_axis))
wish_plot <- ggplot2::ggplot(wish_tib, aes(time, success))  # creates an object called `wish_plot` that contains the boxplot
# ggplot() function specifies the plot will use `wish_tib` and plots time on *x* and success on *y*
wish_plot +
  geom_boxplot() # adds boxplot geom to wish_plot

wish_plot <- ggplot2::ggplot(wish_tib, aes(time, success))
wish_plot +
  geom_boxplot() +
  labs(x = "Time", y = "Success (%)") + # add labels to axes
  theme_minimal() # add minimal theme layer

  • plot shows slight increase of success, but doesn’t show the effect of hard work