The final activity for each learning lab provides space to work with data and to reflect on how the concepts and techniques introduced in each lab might apply to your own research.
To earn a badge for each lab, you are required to respond to a set of prompts for two parts:
In Part I, you will reflect on your understanding of key concepts and begin to think about potential next steps for your own study.
In Part II, you will complete a few R exercises that demonstrates your ability to apply the first phases of the LA workflow and data wrangling techniques introduced in this learning lab.
Part I: Reflect and Plan
Use the institutional library (e.g. NCSU Library), Google Scholar or search engine to locate a research article, presentation, or resource that applies learning analytics analysis to an educational context or topic of interest. More specifically, locate a study that makes use of the Learning Analytics Workflow we learned today. You are also welcome to select one of your research papers.
Provide an APA citation for your selected study.
What educational issue, “problem of practice,” and/or questions were addressed?
Briefly describe any steps of the data-intensive research workflow that detailed in your article or presentation.
What were the key findings or conclusions? What value, if any, might education practitioners find in these results?
Finally, how, if at at, were educators in your self-selected article involved prior to wrangling and analysis?
Draft a new research question of guided by the the phases of the Learning Analytics Workflow. Or use one of your current research questions.
What educational issue, “problem of practice,” and/or questions is addressed??
Briefly describe any steps of the data-intensive research workflow that can be detailed in your article or presentation.
How, if at all, will your article touch upon the application(s) of LA to “understand and improve learning and the contexts in which learning occurs?”
Part II: Data Product
In our Learning Analytics code-along, we scratched the surface on the number of ways that we can wrangle the data.
Using one of the data sets provided in the data folder, your goal for this lab is to extend the Learning Analytics Workflow from our code-along by preparing and wrangling different data.
Or alternatively, you may use your own data set to use in the workflow. If you do decide to use your own data set you must include:
Show two different ways using select function with your data, inspect and save as a new object.
Show one way to use filter function with your data, inspect and save as a new object.
Show one way using arrange function with your data, inspect and save as a new object.
Use the pipe operator to bring it all together.
Feel free to create a new script in your lab 2 to work through the following problems. Then when satisfied add the code in the code chunks below. Don’t forget to run the code to make sure it works.
Instructions:
Add your name to the document in author.
Set up the first (or, two if using an Introduction) phases of the LA workflow below. I’ve added the wrangle section for you. You will need to Prepare the libraries necessary to wrangle the data.
Wrangle
In the chunk called read-data:Import the sci-online-classes.csv from the data folder and save as a new object called sci_classes. Then inspect your data using a function of your choice.
# YOUR FINAL CODE HERE# Type your code hereimport pandas as pd# Load the CSV filesci_classes = pd.read_csv("data/sci-online-classes.csv")# Inspect the DataFrameprint(sci_classes.head()) # Display the first few rows of the DataFrame
In the select-1 code chunk: Use the indexing method to select student_id, subject, semester, FinalGradeCEMS. Assign to a new object with a different name.
# YOUR FINAL CODE HERE# Select specific columns and create a new DataFrameselected_data = sci_classes[['student_id', 'subject', 'semester', 'FinalGradeCEMS']]# Inspect the selected dataprint(selected_data.describe()) # Provides summary statistics which can include NAs analysis
student_id FinalGradeCEMS
count 603.000000 573.000000
mean 86069.535655 77.202655
std 10548.597457 22.225076
min 43146.000000 0.000000
25% 85612.500000 71.251142
50% 88340.000000 84.569444
75% 92730.500000 92.099323
max 97441.000000 100.000000
Note on FinalGradeCEMS: You may observe NA values indicating missing data. This requires handling either by imputation or removal depending on the analysis requirements.
In the following code chunk, handle the missing values in the DataFrame you just created:
# YOUR FINAL CODE HERE# Inspect the selected data for missing valuesprint(selected_data.describe()) # Provides summary statistics, useful for identifying missing data# Option 1: Drop rows where 'FinalGradeCEMS' is missingcleaned_data = selected_data.dropna(subset=['FinalGradeCEMS'])# Option 2: Fill missing 'FinalGradeCEMS' with a placeholder or statistical value (mean, median, etc.)median_value = selected_data['FinalGradeCEMS'].median()selected_data['FinalGradeCEMS'].fillna(median_value, inplace=True)# Display the cleaned dataprint(cleaned_data.head()) # Using option 1: dropped NA rowsprint(selected_data.head()) # Using option 2: filled NA values
/var/folders/g9/4cy0xld1441bjvm88t2w19fm0000gq/T/ipykernel_97222/1159102793.py:10: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
selected_data['FinalGradeCEMS'].fillna(median_value, inplace=True)
6.In the select-2 code chunk: _Select all columns except subject and section. Assign to a new object with a different name. Inspect your data frame with a different function.
# YOUR FINAL CODE HERE# Select all columns except 'subject' and 'section'reduced_data = sci_classes.drop(columns=['subject', 'section'])# Inspect the reduced dataprint(reduced_data.info()) # Displays information about the DataFrame including data types and non-null counts
7.In the filter-1 code chunk: Filter the sci_classes data frame for students in OcnA courses. Assign to a new object with a different name. Use the head() function to examine your data frame.
# YOUR FINAL CODE HERE# Filter the DataFrame for students in 'OcnA' coursesocna_students = sci_classes[sci_classes['subject'] =='OcnA']# Display the first few rows of the filtered dataprint(ocna_students.head()) # The head function by default displays the first 5 rows
Question: How many rows does the head() function display? Answer: The head function displays 5 rows by default.
8.In the filter-2 code chunk: Filter the sci_classes data frame so rows with NA for ‘total_points_possible’ are removed. Assign to a new object with a different name. Use info() to examine all columns of your data frame.
# YOUR FINAL CODE HERE# Filter out rows with NA in 'total_points_possible'no_na_points = sci_classes.dropna(subset=['total_points_possible'])# Inspect the modified DataFrameprint(no_na_points.info()) # Displays information similar to str in R
In the arrange-1 code chunk: Arrange sci_classes data by subject then percentage_earned in descending order. Assign to a new object. Use the dtypes function to examine the data type of each column in your data frame.
# YOUR FINAL CODE HERE# Arrange the DataFrame by 'subject' and 'percentage_earned' in descending orderarranged_classes = sci_classes.sort_values(by=['subject', 'percentage_earned'], ascending=[True, False])# Inspect the structure of the arranged DataFrameprint(arranged_classes.dtypes) # Similar to str function in R
In the code chunk named final-wrangle: Use the pandas library to chain together multiple data manipulation methods on the sci_classes data. Here’s what you need to do step by step:
Select the columns student_id, subject, semester, and FinalGradeCEMS.
Filter the data to include only rows where the subject is ‘OcnA’.
Arrange the data by FinalGradeCEMS in descending order.
Assign the result to a new object called final_data.
Examine the contents of final_data using a print statement to display the results.
# YOUR FINAL CODE HERE# Use chaining to select, filter, and arrange datafinal_data = (sci_classes .loc[:, ['student_id', 'subject', 'semester', 'FinalGradeCEMS']] .query("subject == 'OcnA'") .sort_values('FinalGradeCEMS', ascending=False))# Examine the contents of the final data manipulationprint(final_data)
To receive your the Foundations Badge, you will need to render this document and publish via a method designated by your instructor such as: Quarto Pub, Posit Cloud, RPubs , GitHub Pages, or other methods. Once you have shared a link to you published document with your instructor and they have reviewed your work, you will be provided a physical or digital version of the badge pictured at the top of this document!
If you have any questions about this badge, or run into any technical issues, don’t hesitate to contact your instructor.
Once your instructor has checked your link, you will be provided a physical version of the badge below!
Complete the following steps to submit your work for review by:
First, change the name of the author: in the YAML header at the very top of this document to your name. The YAML header controls the style and feel for knitted document but doesn’t actually display in the final output.
Next, click the knit button in the toolbar above to “knit” your R Markdown document to a HTML file that will be saved in your R Project folder. You should see a formatted webpage appear in your Viewer tab in the lower right pan or in a new browser window. Let’s us know if you run into any issues with knitting.