Explore

LAW Module 2: A Code-a-long

Welcome to the LAW Code-a-long for Module 2

  • Exploratory Data Analysis (EDA) for educational researchers involves investigating and summarizing data sets to uncover patterns, spot anomalies, and test hypotheses, using statistical graphics and other data visualization methods.

  • This process helps researchers understand underlying trends in educational data before applying more complex analytical techniques.

Module Objectives

By the end of this module:

Data Visualization with ggplot2:

  • Learners will understand how to use ggplot2 to create various types of plots and graphs, enabling them to visualize data effectively and identify patterns and trends.

Data Transformation and Preprocessing:

  • Learners will gain proficiency in transforming and preprocessing raw data using R, ensuring the data is clean, structured correctly, and ready for analysis.

Exploratory Data Analysis

  • Data Visualization

  • Data Transformation

  • Data Preprocessing (DP)

  • Feature Engineering (FE)

About skimr

  • Load the skimr package and use skim() to skim data_to_explore
#load library
library(skimr)

#skim data
skim(data_to_explore)
data_to_explore %>% 
  select(c('subject', 'gender', 'proportion_earned', 'time_spent')) %>% 
  filter(subject == "OcnA" | subject == "PhysA") %>%
  skim() 

About ggplot2

Do you need all of these things to create a graph?

ggplot(data_to_explore, aes(x=subject)) + 
  geom_bar()

ggplot(data_to_explore)+
  geom_bar(aes(x=subject))

data_to_explore %>% 
  ggplot(aes(x = subject)) +
  geom_bar()

Plotting Histograms

  • Load ggplot2
  • Write the code for a basic histogram for time_spent_hours
# Layer 1: add data and aesthetic mapping
data_to_explore %>%
  ggplot(aes(x = time_spent_hours)) +
# layer 2: add histogram geom
  geom_histogram()

# Layer 1: add data and aesthetic mapping
data_to_explore %>% 
  ggplot(aes(x = time_spent_hours)) +
# layer 2: add histogram geom 
# layer 3a: add bin size
  geom_histogram(bins = 10)

# Layer 1: add data and aesthetic mapping
data_to_explore %>% 
  ggplot(aes(x = time_spent_hours)) +
# layer 2: add histogram geom 
# layer 3a: add bin size
#layer 3b: add color
  geom_histogram(bins = 30,
                 fill = "red",
                 colour = "black") 

# Layer 1: add data and aesthetic mapping
data_to_explore %>% 
  ggplot(aes(x = time_spent_hours)) +
# layer 2: add histogram geom 
# layer 3a: add bin size
# layer 3b: add color
  geom_histogram(bins = 30, fill = "red", colour = "black")+
#layer 4: add Labels
  labs(title="Time Spent on LMS histogram plot",x="Time Spent(hours)", y = "Count")+
  theme_classic()

Plotting Scatterplots

#layer 1: add data and aesthetics mapping 
ggplot(data_to_explore, #<<
       aes(x = time_spent_hours, 
           y = proportion_earned)) +
#layer 2: +  geom function type
  geom_point() #<<

#layer 1: add data and aesthetics mapping 
#layer 3: add color scale by type
ggplot(data_to_explore, 
       aes(x = time_spent_hours, 
           y = proportion_earned,
           color = enrollment_status)) + #<<
#layer 2: +  geom function type
  geom_point()

#layer 1: add data and aesthetics mapping 
#layer 3: add color scale by type
ggplot(data_to_explore, 
       aes(x = time_spent_hours, 
           y = proportion_earned,
           color = enrollment_status)) +
#layer 2: +  geom function type
  geom_point() +
#layer 4: add labels
  labs(title="How Time Spent on Course LMS is Related to Points Earned in the Course", #<<
       x="Time Spent (Hours)", #<<
       y = "Proportion of Points Earned")  #<<

#layer 1: add data and aesthetics mapping 
#layer 3: add color scale by type
viz1 <- ggplot(data_to_explore, aes(x = time_spent_hours, y = proportion_earned, color = enrollment_status)) +
#layer 2: +  geom function type
  geom_point() +
#layer 4: add labels
    labs(title="How Time Spent on Course LMS is Related to Points Earned in the Course", 
       x="Time Spent (Hours)",
       y = "Proportion of Points Earned")
#layer 5: add facet wrap
  facet_wrap(~ subject) #<<

How would you interpret this final graph?

What’s Next?