LA Foundations - Badge - KEY

LASER Institute Foundation Learning Badge 1 - KEY

Author

ADD YOUR NAME HERE

The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.

To earn a badge for each lab, you are required to respond to a set of prompts for two parts:

In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will complete a few R exercises that demonstrates your ability to apply the first phases of the LA workflow and data wrangling techniques introduced in this learning lab.

Part I: Reflect and Plan

Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies learning analytics analysis to an educational context or topic of interest. More specifically, locate a study that makes use of the Learning Analytics Workflow we learned today. You are also welcome to select one of your research papers.

Provide an APA citation for your selected study.
What educational issue, “problem of practice,” and/or questions were addressed?
Briefly describe any steps of the data-intensive research workflow that detailed in your article or presentation.
What were the key findings or conclusions? What value, if any, might education practitioners find in these results?
Finally, how, if at at, were educators in your self-selected article involved prior to wrangling and analysis?

Draft a new research question of guided by the the phases of the Learning Analytics Workflow. Or use one of your current research questions.

What educational issue, “problem of practice,” and/or questions is addressed??
Briefly describe any steps of the data-intensive research workflow that can be detailed in your article or presentation.
How, if at all, will your article touch upon the application(s) of LA to “understand and improve learning and the contexts in which learning occurs?”

Part II: Data Product

In our Learning Analytics code-along, we scratched the surface on the number of ways that we can wrangle the data.

Using one of the data sets provided in the data folder, your goal for this lab is to extend the Learning Analytics Workflow from our code-along by preparing and wrangling different data.

Or alternatively, you may use your own data set to use in the workflow. If you do decide to use your own data set you must include:

Show two different ways using select function with your data, inspect and save as a new object.
Show one way to use filter function with your data, inspect and save as a new object.
Show one way using arrange function with your data, inspect and save as a new object.
Use the pipe operator to bring it all together.

Feel free to create a new script in your lab 2 to work through the following problems. Then when satisfied add the code in the code chunks below. Don’t forget to run the code to make sure it works.

Instructions:

Add your name to the document in author.
Set up the first (or, two if using an Introduction) phases of the LA workflow below. I’ve added the wrangle section for you. You will need to Prepare the libraries necessary to wrangle the data.

Wrangle

In the chunk called read-data:Import the sci-online-classes.csv from the data folder and save as a new object called sci_classes. Then inspect your data using a function of your choice.

# YOUR FINAL CODE HERE
# Type your code here
import pandas as pd

# Load the CSV file
sci_classes = pd.read_csv("data/sci-online-classes.csv")

# Inspect the DataFrame
print(sci_classes.head())  # Display the first few rows of the DataFrame

   student_id      course_id  total_points_possible  total_points_earned  \
0       43146  FrScA-S216-02                   3280                 2220   
1       44638   OcnA-S116-01                   3531                 2672   
2       47448  FrScA-S216-01                   2870                 1897   
3       47979   OcnA-S216-01                   4562                 3090   
4       48797  PhysA-S116-01                   2207                 1910   

   percentage_earned subject semester  section  \
0           0.676829   FrScA     S216        2   
1           0.756726    OcnA     S116        1   
2           0.660976   FrScA     S216        1   
3           0.677335    OcnA     S216        1   
4           0.865428   PhysA     S116        1   

                        Gradebook_Item  Grade_Category  ...   q7   q8   q9  \
0  POINTS EARNED & TOTAL COURSE POINTS             NaN  ...  5.0  5.0  4.0   
1                            ATTEMPTED             NaN  ...  4.0  5.0  4.0   
2  POINTS EARNED & TOTAL COURSE POINTS             NaN  ...  4.0  5.0  3.0   
3  POINTS EARNED & TOTAL COURSE POINTS             NaN  ...  4.0  5.0  5.0   
4  POINTS EARNED & TOTAL COURSE POINTS             NaN  ...  4.0  4.0  NaN   

   q10  TimeSpent  TimeSpent_hours  TimeSpent_std  int   pc        uv  
0  5.0  1555.1667        25.919445      -0.180515  5.0  4.5  4.333333  
1  4.0  1382.7001        23.045002      -0.307803  4.2  3.5  4.000000  
2  5.0   860.4335        14.340558      -0.693260  5.0  4.0  3.666667  
3  5.0  1598.6166        26.643610      -0.148447  5.0  3.5  5.000000  
4  3.0  1481.8000        24.696667      -0.234663  3.8  3.5  3.500000  

[5 rows x 30 columns]

In the select-1 code chunk: Use the indexing method to select student_id, subject, semester, FinalGradeCEMS. Assign to a new object with a different name.

# YOUR FINAL CODE HERE
# Select specific columns and create a new DataFrame
selected_data = sci_classes[['student_id', 'subject', 'semester', 'FinalGradeCEMS']]

# Inspect the selected data
print(selected_data.describe())  # Provides summary statistics which can include NAs analysis

         student_id  FinalGradeCEMS
count    603.000000      573.000000
mean   86069.535655       77.202655
std    10548.597457       22.225076
min    43146.000000        0.000000
25%    85612.500000       71.251142
50%    88340.000000       84.569444
75%    92730.500000       92.099323
max    97441.000000      100.000000

Note on FinalGradeCEMS: You may observe NA values indicating missing data. This requires handling either by imputation or removal depending on the analysis requirements.

In the following code chunk, handle the missing values in the DataFrame you just created:

# YOUR FINAL CODE HERE
# Inspect the selected data for missing values
print(selected_data.describe())  # Provides summary statistics, useful for identifying missing data

# Option 1: Drop rows where 'FinalGradeCEMS' is missing
cleaned_data = selected_data.dropna(subset=['FinalGradeCEMS'])

# Option 2: Fill missing 'FinalGradeCEMS' with a placeholder or statistical value (mean, median, etc.)
median_value = selected_data['FinalGradeCEMS'].median()
selected_data['FinalGradeCEMS'].fillna(median_value, inplace=True)

# Display the cleaned data
print(cleaned_data.head())  # Using option 1: dropped NA rows
print(selected_data.head())  # Using option 2: filled NA values

         student_id  FinalGradeCEMS
count    603.000000      573.000000
mean   86069.535655       77.202655
std    10548.597457       22.225076
min    43146.000000        0.000000
25%    85612.500000       71.251142
50%    88340.000000       84.569444
75%    92730.500000       92.099323
max    97441.000000      100.000000
   student_id subject semester  FinalGradeCEMS
0       43146   FrScA     S216       93.453725
1       44638    OcnA     S116       81.701843
2       47448   FrScA     S216       88.487585
3       47979    OcnA     S216       81.852596
4       48797   PhysA     S116       84.000000
   student_id subject semester  FinalGradeCEMS
0       43146   FrScA     S216       93.453725
1       44638    OcnA     S116       81.701843
2       47448   FrScA     S216       88.487585
3       47979    OcnA     S216       81.852596
4       48797   PhysA     S116       84.000000

/var/folders/g9/4cy0xld1441bjvm88t2w19fm0000gq/T/ipykernel_97222/1159102793.py:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  selected_data['FinalGradeCEMS'].fillna(median_value, inplace=True)

6.In the select-2 code chunk: _Select all columns except subject and section. Assign to a new object with a different name. Inspect your data frame with a different function.

# YOUR FINAL CODE HERE

# Select all columns except 'subject' and 'section'
reduced_data = sci_classes.drop(columns=['subject', 'section'])

# Inspect the reduced data
print(reduced_data.info())  # Displays information about the DataFrame including data types and non-null counts

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 603 entries, 0 to 602
Data columns (total 28 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   student_id             603 non-null    int64  
 1   course_id              603 non-null    object 
 2   total_points_possible  603 non-null    int64  
 3   total_points_earned    603 non-null    int64  
 4   percentage_earned      603 non-null    float64
 5   semester               603 non-null    object 
 6   Gradebook_Item         603 non-null    object 
 7   Grade_Category         0 non-null      float64
 8   FinalGradeCEMS         573 non-null    float64
 9   Points_Possible        603 non-null    int64  
 10  Points_Earned          511 non-null    float64
 11  Gender                 603 non-null    object 
 12  q1                     480 non-null    float64
 13  q2                     477 non-null    float64
 14  q3                     480 non-null    float64
 15  q4                     478 non-null    float64
 16  q5                     476 non-null    float64
 17  q6                     476 non-null    float64
 18  q7                     474 non-null    float64
 19  q8                     474 non-null    float64
 20  q9                     474 non-null    float64
 21  q10                    474 non-null    float64
 22  TimeSpent              598 non-null    float64
 23  TimeSpent_hours        598 non-null    float64
 24  TimeSpent_std          598 non-null    float64
 25  int                    527 non-null    float64
 26  pc                     528 non-null    float64
 27  uv                     528 non-null    float64
dtypes: float64(20), int64(4), object(4)
memory usage: 132.0+ KB
None

7.In the filter-1 code chunk: Filter the sci_classes data frame for students in OcnA courses. Assign to a new object with a different name. Use the head() function to examine your data frame.

# YOUR FINAL CODE HERE

# Filter the DataFrame for students in 'OcnA' courses
ocna_students = sci_classes[sci_classes['subject'] == 'OcnA']

# Display the first few rows of the filtered data
print(ocna_students.head())  # The head function by default displays the first 5 rows

    student_id     course_id  total_points_possible  total_points_earned  \
1        44638  OcnA-S116-01                   3531                 2672   
3        47979  OcnA-S216-01                   4562                 3090   
11       54066  OcnA-S116-01                   4641                 3429   
12       54282  OcnA-S116-02                   3581                 2777   
13       54342  OcnA-S116-02                   3256                 2876   

    percentage_earned subject semester  section  \
1            0.756726    OcnA     S116        1   
3            0.677335    OcnA     S216        1   
11           0.738849    OcnA     S116        1   
12           0.775482    OcnA     S116        2   
13           0.883292    OcnA     S116        2   

                         Gradebook_Item  Grade_Category  ...   q7   q8   q9  \
1                             ATTEMPTED             NaN  ...  4.0  5.0  4.0   
3   POINTS EARNED & TOTAL COURSE POINTS             NaN  ...  4.0  5.0  5.0   
11                            ATTEMPTED             NaN  ...  5.0  4.0  5.0   
12  POINTS EARNED & TOTAL COURSE POINTS             NaN  ...  3.0  3.0  2.0   
13  POINTS EARNED & TOTAL COURSE POINTS             NaN  ...  5.0  5.0  2.0   

    q10  TimeSpent  TimeSpent_hours  TimeSpent_std  int   pc        uv  
1   4.0  1382.7001        23.045002      -0.307803  4.2  3.5  4.000000  
3   5.0  1598.6166        26.643610      -0.148447  5.0  3.5  5.000000  
11  4.0  2625.5164        43.758607       0.609452  4.4  4.0  5.000000  
12  4.0  2025.1672        33.752787       0.166367  3.4  3.0  2.666667  
13  5.0  1581.0831        26.351385      -0.161387  4.7  4.5  3.833333  

[5 rows x 30 columns]

Question: How many rows does the head() function display? Answer: The head function displays 5 rows by default.

8.In the filter-2 code chunk: Filter the sci_classes data frame so rows with NA for ‘total_points_possible’ are removed. Assign to a new object with a different name. Use info() to examine all columns of your data frame.

# YOUR FINAL CODE HERE

# Filter out rows with NA in 'total_points_possible'
no_na_points = sci_classes.dropna(subset=['total_points_possible'])

# Inspect the modified DataFrame
print(no_na_points.info())  # Displays information similar to str in R

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 603 entries, 0 to 602
Data columns (total 30 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   student_id             603 non-null    int64  
 1   course_id              603 non-null    object 
 2   total_points_possible  603 non-null    int64  
 3   total_points_earned    603 non-null    int64  
 4   percentage_earned      603 non-null    float64
 5   subject                603 non-null    object 
 6   semester               603 non-null    object 
 7   section                603 non-null    int64  
 8   Gradebook_Item         603 non-null    object 
 9   Grade_Category         0 non-null      float64
 10  FinalGradeCEMS         573 non-null    float64
 11  Points_Possible        603 non-null    int64  
 12  Points_Earned          511 non-null    float64
 13  Gender                 603 non-null    object 
 14  q1                     480 non-null    float64
 15  q2                     477 non-null    float64
 16  q3                     480 non-null    float64
 17  q4                     478 non-null    float64
 18  q5                     476 non-null    float64
 19  q6                     476 non-null    float64
 20  q7                     474 non-null    float64
 21  q8                     474 non-null    float64
 22  q9                     474 non-null    float64
 23  q10                    474 non-null    float64
 24  TimeSpent              598 non-null    float64
 25  TimeSpent_hours        598 non-null    float64
 26  TimeSpent_std          598 non-null    float64
 27  int                    527 non-null    float64
 28  pc                     528 non-null    float64
 29  uv                     528 non-null    float64
dtypes: float64(20), int64(5), object(5)
memory usage: 141.5+ KB
None

In the arrange-1 code chunk: Arrange sci_classes data by subject then percentage_earned in descending order. Assign to a new object. Use the dtypes function to examine the data type of each column in your data frame.

# YOUR FINAL CODE HERE

# Arrange the DataFrame by 'subject' and 'percentage_earned' in descending order
arranged_classes = sci_classes.sort_values(by=['subject', 'percentage_earned'], ascending=[True, False])

# Inspect the structure of the arranged DataFrame
print(arranged_classes.dtypes)  # Similar to str function in R

student_id                 int64
course_id                 object
total_points_possible      int64
total_points_earned        int64
percentage_earned        float64
subject                   object
semester                  object
section                    int64
Gradebook_Item            object
Grade_Category           float64
FinalGradeCEMS           float64
Points_Possible            int64
Points_Earned            float64
Gender                    object
q1                       float64
q2                       float64
q3                       float64
q4                       float64
q5                       float64
q6                       float64
q7                       float64
q8                       float64
q9                       float64
q10                      float64
TimeSpent                float64
TimeSpent_hours          float64
TimeSpent_std            float64
int                      float64
pc                       float64
uv                       float64
dtype: object

In the code chunk named final-wrangle: Use the pandas library to chain together multiple data manipulation methods on the sci_classes data. Here’s what you need to do step by step:

Select the columns student_id, subject, semester, and FinalGradeCEMS.
Filter the data to include only rows where the subject is ‘OcnA’.
Arrange the data by FinalGradeCEMS in descending order.
Assign the result to a new object called final_data.
Examine the contents of final_data using a print statement to display the results.

# YOUR FINAL CODE HERE

# Use chaining to select, filter, and arrange data
final_data = (sci_classes
              .loc[:, ['student_id', 'subject', 'semester', 'FinalGradeCEMS']]
              .query("subject == 'OcnA'")
              .sort_values('FinalGradeCEMS', ascending=False))

# Examine the contents of the final data manipulation
print(final_data)

     student_id subject semester  FinalGradeCEMS
44        66740    OcnA     S116       99.329983
400       91163    OcnA     S216       97.370184
494       94744    OcnA     S216       96.797320
431       91818    OcnA     S116       96.462312
357       90090    OcnA     S116       96.298157
..          ...     ...      ...             ...
142       85487    OcnA     S116             NaN
209       86340    OcnA     S216             NaN
250       86836    OcnA     S116             NaN
310       88504    OcnA     S216             NaN
544       95738    OcnA     S216             NaN

[111 rows x 4 columns]

Render & Submit

Congratulations, you’ve completed Foundations Learning Badge 4!

To receive your the Foundations Badge, you will need to render this document and publish via a method designated by your instructor such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with your instructor and they have reviewed your work, you will be provided a physical or digital version of the badge pictured at the top of this document!

If you have any questions about this badge, or run into any technical issues, don’t hesitate to contact your instructor.

Once your instructor has checked your link, you will be provided a physical version of the badge below!

Complete the following steps to submit your work for review by:

First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.

Next, click the knit button in the toolbar above to “knit” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let’s us know if you run into any issues with knitting.

Finally, publish.