Rows: 68 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): id, novice
dbl (2): creativity, position
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 103 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): sex
dbl (4): id, revise, exam_grade, anxiety
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
plot continuous variables, the ggscatmat() function produces a matrix of scatterplots (below the diagonal), distributions (along the diagonal) and the correlation coefficient (above the diagonal)
should be replaced with the name of tibble containing any variables to correlate
method
method of correlation coefficient, default is pearson, but can also accept spearman, kendall, biserial, polychoric, tetrachoric, and percentage
p_adjust
corrects the \(p\)-value for the number of tests you have performed using the Holm-Bonferroni method
applies the Bonferroni criterion in a slightly less strict way that controls the type I error, but with less risk of a type II error
can change to none (bad idea), bonferroni (to apply the standard Bonferroni method) or several other methods.
ci
set the confidence interval width; default is 0.95 for general use
To use the function, - pipe tibble into the select() function from dplyr to select variables to correlate, then pipe that into the correlation function - use the same syntax whether you want to correlate two variables or produce all correlations between pairs of multiple variables]
To calculate Pearson correlation btwn variables exam_grade and revise in exam_tib…
The confidence interval for the association between exam grade and revision is 0.22 to 0.55. What does this tell us?
If this confidence interval is one of the 95% that contains the population value then the population value of r lies between 0.22 and 0.55.
The p-value for the association between exam grade and revision is < 0.001, what does this value mean?
The probability of getting a value of t at least as big as the value we have observed, if the value of r were, in fact, zero is less than 0.001. I’m going to assume, therefore, that the association between exam grade and revision is not zero.
exam grade correlates with revision - \(r\)=0.4
exam grade had a similar strength relationship with exam anxiety \(r\)=-0.44 but in the opposite direction
revision had a strong negative relationship with anxiety - \(r\)=-0.709
the more you revise, the better your performance
the more anxiety you have, the worse your performance
the mopre you revise, the less anxiety you have
all \(p\)-values are less than 0.001 and would be interpreted as the correlation coefficients being significantly different from zero
significance values tell us that the probability of getting correlation coefficients at least as big as this in a sample of 103 people if the null were true (that there was no relationship between the variables) is very low
if we assume the sample is one of the 95% of samples that will produce a confidence interval containing the population value, then the confidence intervals tell us about the uncertainty around \(r\).
Rounding
We can control the number of decimal places using knitr::kable(digits = 3)
We can also specify different columns to contain different rounding using knitr::kable(digits = c(2, 2, 2, 2, 2, 2, 2, 2, 8)) (column 9 to 8 decimal places) or knitr::kable(digits = c(rep(2, 8), 8))
Robust correlation coefficients
Given the skew in the variables, we should use a robust correlation coefficient, like the percentage bend correlation coefficient by setting method = "percentage" within correlation()
All robust correlations (percentage bend) are less than raw, though all are significant at \(p<0.001\)
Spearman’s correlation coefficient
data from World’s Best Liar competition
want to know if creativity impacts lying ability
position data (1st, 2nd, etc) is ordinal, so Spearman’s correlation coefficient should be used
Data are in
liar_tib
# A tibble: 68 × 4
id creativity position novice
<chr> <dbl> <dbl> <fct>
1 lnwe 53 1 First time
2 vxob 36 3 Previous entrant
3 qpli 31 4 First time
4 pwsq 43 2 First time
5 xafq 30 4 Previous entrant
6 njra 41 1 First time
7 lxty 32 4 First time
8 dxcw 54 1 Previous entrant
9 uxgp 47 2 Previous entrant
10 dvew 50 2 First time
# ℹ 58 more rows
output shows \(\tau=-0.3\) -> closer to 0 than Spearman (-.38) therefore Kendall’s value is likely a more accurate guage of what the correlation in the population would be