- We want to make predictions about an outcome of interest based on predictor variables that we think are related to the outcome.
- We’ll be using a widely-used data set in the learning analytics field: the Open University Learning Analytics Dataset (OULAD).
- The OULAD was created by learning analytics researchers at the United Kingdom-based Open University.
- It includes data from post-secondary learners participation in one of several Massive Open Online Courses (called modules in the OULAD).
- Many students pass these courses, but not all do
- We have data on students’ initial characteristics and their interactions in the course
- If we could develop a good prediction model, we could provide additional supports to students–and maybe move the needle on some students succeeding who might not otherwise
We’ll be focusing on three files:
- studentInfo, courses, and studentRegistration
These are joined together (see oulad.R
) for this module. You’ll be doing more joining later!
# A tibble: 3 × 15
code_module code_presentation id_student gender region highest_education
<chr> <chr> <dbl> <chr> <chr> <chr>
1 AAA 2013J 11391 M East Anglia… HE Qualification
2 AAA 2013J 28400 F Scotland HE Qualification
3 AAA 2013J 30268 F North Weste… A Level or Equiv…
# ℹ 9 more variables: imd_band <chr>, age_band <chr>,
# num_of_prev_attempts <dbl>, studied_credits <dbl>, disability <chr>,
# final_result <chr>, module_presentation_length <dbl>,
# date_registration <dbl>, date_unregistration <dbl>
- Prepare: Prior to analysis, we’ll take a look at the context from which our data came, formulate some questions, and load R packages.
- Wrangle: In the wrangling section, we will learn some basic techniques for manipulating, cleaning, transforming, and merging data.
- Explore: The processes of wrangling and exploring data often go hand in hand.
- Model: In this step, we carry out the analysis - here, supervised machine learning.
- Communicate: Interpreting and communicating the results of our findings is the last step.
- Split data (Prepare)
- Engineer features and write down the recipe (Wrangle and Explore)
- Specify the model and workflow (Model)
- Fit model (Model)
- Evaluate accuracy (Communicate)
This is the fundamental process we’ll follow for this and the next two modules focused on supervised ML