Warning: package 'tinytable' was built under R version 4.5.2
Lets’s GOOOO!!
Data collection has (mostly) closed on my dissertation survey and I hit a little over 300 responses from faculty and instructors across the province, so I am pleased with that. So now begins the work of making sense of it all in light of my previous work on assessment in higher ed.
To improve reproducibility, share my process, and help future me remember what today me did, I am endeavouring to use this space as an analysis journal. I want to say I lost some work on Friday evening, but that is a bit too passive. I deleted some work because I hadn’t pushed changes in a while, saw that there was a whole bunch of changed files in my directory and deleted a bunch. Unfortunately for me, I deleted my changes on my analysis.r file shortly after figuring out why I was getting errors.
I’ll share R scripts and files, but will keep the whole repo private until it’s time to publish, when the entire dataset will be published under an open license.
General Workflow
I’m composing this paper using Positron, a data science integrated development environment (IDE). For the uninitiated, Positron is a text editor that connects to a git service to allow for composing text in Markdown with embedded R code for the analysis of the data. Positron is created and maintained by the company Posit, who also created a scientific and academic publishing platform called Quarto that integrates with Positron.
I thus use Positron to create documents that have R code embedded in them. I am using R to manage the entire data analysis process, beginning with taking the raw csv exported from SurveyMonkey, making sure it is tidy (each column is a variable, each row is a case, and each cell is a single observation), excluding incomplete submissions (currently anything below 90% complete is excluded), calculating missingness (how many NA responses are in the resulting dataset), and saving the data in a long format (going from 301 rows and 80 columns to 12636 rows and two columns).
Wrangling the Data
Following recommended practice, I am splitting the analysis into multiple files. Embedding all of the R code in the text of a paper makes for an unwieldy paper, with hundreds or eventually thousands of lines of code interrupting the text of the paper. Currently I have a file called analysis.R which contains the data wrangling code for the paper. When I run that code, it exports all of the requested R objects to an output folder. So instead of dozens of lines of code in my paper to generate a single plot or table, I simply call the saved R object and display it inline with 1-2 lines of code.
What’s more, if I get new data, all I need to do to update every single plot, table, or dataframe that I create is to update 2-3 characters twice, and everything is updated with the new data, and all the plots and tables automatically display the new analysis.
What’s even more, because all the data is in my repository (this data is anonymous, and I have permission to store on GitHub), someone can fork my repo, rerun the analysis, run new analyses, and check my work. They can also tweak my code and send a pull request back to me so I have the option of including their changes.