Outcome | Prediction | Correct? |
---|---|---|
1 | 1 | Yes |
0 | 0 | Yes |
0 | 1 | No |
1 | 0 | No |
1 | 1 | Yes |
Conceptual Overview
How do we interpret a machine learning model? What else can we say, besides how accurate a model this? This learning lab is intended to help you to answer these questions by examining output from a classification and a regression model. We again use the OULAD, but add an assessment file.
Let’s start with accuracy and a simple confusion matrix; what is the Accuracy?
Outcome | Prediction | Correct? |
---|---|---|
1 | 1 | Yes |
0 | 0 | Yes |
0 | 1 | No |
1 | 0 | No |
1 | 1 | Yes |
Use the tabyl()
function (from {janitor} to calculate the accuracy in the code chunk below.
Now, let’s create a confusion matrix based on this data:
Accuracy: Prop. of the sample that is true positive or true negative
True positive (TP): Prop. of the sample that is affected by a condition and correctly tested positive
True negative (TN): Prop. of the sample that is not affected by a condition and correctly tested negative
False positive (FP): Prop. of the sample that is not affected by a condition and incorrectly tested positive
False negative (FN): Prop. of the sample that is affected by a condition and incorrectly tested positive.
Let’s consider a very simple data set, d
, one with time_point data, var_a
, for a single student. How do we add this to our model? Focus on the time element; how could you account for this?
How about a different variable, now focusing on the variable, var_b
. How could we add this to a model?
step()
:
step_dummy()
step_normalize()
step_inpute()
step_date()
step_holiday()
Baker, R. S., Berning, A. W., Gowda, S. M., Zhang, S., & Hawn, A. (2020). Predicting K-12 dropout. Journal of Education for Students Placed at Risk (JESPAR), 25(1), 28-54.
Baker, R. S., Bosch, N., Hutt, S., Zambrano, A. F., & Bowers, A. J. (2024). On fixing the right problems in predictive analytics: AUC is not the problem. arXiv preprint. https://arxiv.org/pdf/2404.06989
Maestrales, S., Zhai, X., Touitou, I., Baker, Q., Schneider, B., & Krajcik, J. (2021). Using machine learning to score multi-dimensional assessments of chemistry and physics. Journal of Science Education and Technology, 30(2), 239-254.
metric_set
General troubleshooting tips for R and RStudio