The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Wrangling data
filter() a data frame’s existing rows to only pick out a subset of them. For example, the alaska_flights data frame.
summarize() one or more of its columns/variables with a summary statistic. Examples of summary statistics include the median and interquartile range of temperatures as we saw in Section 2.7 on boxplots.
group_by() its rows. In other words, assign different rows to be part of the same group. We can then combine group_by() with summarize() to report summary statistics for each group separately. For example, say you don’t want a single overall average departure delay dep_delay for all three origin airports combined, but rather three separate average departure delays, one computed for each of the three origin airports.
mutate() its existing columns/variables to create new ones. For example, convert hourly temperature readings from Fahrenheit to Celsius.
arrange() its rows. For example, sort the rows of weather in ascending or descending order of temp.
join() it with another data frame by matching along a “key” variable. In other words, merge these two data frames together.
Piper DOWN!! |>r DOWN!
the |> was introduced to R in 2021 to replace the clumsy %>%
allows users to create a sequence of operations within one set of instructions
Take x then
Use x as an input to a function f() then
Use the output of f(x) as an input to a function g() then
Use the output of g(f(x)) as an input to a function h()
One way is to nest parentheses…
h(g(f(x)))
Not hard to read as there are only 3 simple arguments, but this quickly gets out of hand with many complex arguments…so…
x |>
f() |>
g() |>
h()
The pipe takes the output of one function an ‘pipes’ it into the next function as an argument.