Modeling for educational researchers builds on insights from Exploratory Data Analysis (EDA) to develop predictive or explanatory models that guide decision-making and enhance learning outcomes.
This involves selecting appropriate models, preparing and transforming data, training and validating the models, and interpreting results. Effective modeling helps uncover key factors influencing educational success, allowing for targeted interventions and informed policy decisions.
Module Objectives
By the end of this module:
Introduction to Modeling:
Learners will understand the importance of modeling in the learning analytics workflow and how it helps quantify insights from data.
Creating and Interpreting Correlations:
Learners will create and interpret correlation matrices using the {corrr} package and create APA-formatted correlation tables using the {apaTables} package.
Applying Various Modeling Techniques:
Learners will fit and understand linear regression models to prepare for the case study.
Steps in the Modeling Process
Correlation
Corrr package for correlation
#install corrr package if this is your first time#install.packages("corrr")# read in librarylibrary(corrr) #<<data_to_explore %>%select(proportion_earned, time_spent_hours) %>%correlate() #<<
# A tibble: 2 × 3
term proportion_earned time_spent_hours
<chr> <dbl> <dbl>
1 proportion_earned NA 0.438
2 time_spent_hours 0.438 NA
How can we interpret this?
APA TABLE
#install if this is your first timeinstall.packages("apaTables")# read in apatables librarylibrary(apaTables)data_to_explore_subset <- data_to_explore %>%select(time_spent_hours, proportion_earned, int)apa.cor.table(data_to_explore_subset)#<<
In the corresponding script 1. create a linear regression - independent = time_spent_hours - dependent = proportion_earned 2. save as a new object 3. inspect the data
👉 Your Turn⤵ -> Answer
model1 <-lm(proportion_earned ~ time_spent_hours, data = data_to_explore)summary(model1)
Call:
lm(formula = proportion_earned ~ time_spent_hours, data = data_to_explore)
Residuals:
Min 1Q Median 3Q Max
-64.671 -7.841 5.427 15.419 33.743
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 62.43061 1.51073 41.33 <2e-16 ***
time_spent_hours 0.47921 0.04025 11.91 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 22.21 on 596 degrees of freedom
(345 observations deleted due to missingness)
Multiple R-squared: 0.1921, Adjusted R-squared: 0.1908
F-statistic: 141.8 on 1 and 596 DF, p-value: < 2.2e-16
👉 Your Turn⤵
create a ggplot viz for our model
Add + geom_smooth(method = “lm”) after your geom_point
###
👉 Your Turn⤵ -> Answer
data_to_explore %>%ggplot(aes(x = time_spent_hours, y = proportion_earned, color = enrollment_status)) +geom_point() +# add geom_smooth for lmgeom_smooth(method ="lm")+labs(title="How Time Spent on Course LMS is Related to Points Earned in the Course", x="Time Spent (Hours)",y ="Proportion of Points Earned")