Text Mining

The transition to digital learning has made available new sources of data, providing researchers new opportunities for understanding and improving STEM learning. Data sources such as digital learning environments and administrative data systems, as well as data produced by social media websites and the mass digitization of academic and practitioner publications, hold enormous potential to address a range of pressing problems in STEM Education, but collecting and analyzing text-based data also presents unique challenges. Text mining labs address the following critical questions:

  1. What kinds of text data are valuable?
  2. How can we quantify text data?
  3. What kinds of research questions could be addressed with text data?
  4. What opportunities and challenges do large language models bring to the field of mining STEM education data?
  5. How can we set up a research agenda that drives innovations in STEM education research with text data?
Github
Repository for Instructors
Posit Cloud Workspace for Learners

Module 1: Tidy Text & Word Counts (TM Basics)

This module is a gentle introduction to getting our text “tidy” so we can perform some basic word counts, look at words that occur at a higher rate in a group of documents, examine words that are unique to those document groups, and create visualizations such as word cloud. The focus of our Essential Readings and case study in this lab is to help LASER Scholars gain a general understanding of key text mining concepts and terminology, as well as develop a basic comfort level with quantifying text data and working with text data. Our Text Mining Case Study: What aspects of online professional development offerings do teachers find most valuable? is guided by the work from Friday Institute and it examines teachers’ experiences in professional development (Kellogg et al. 2012). Finally, the Intro to Text Mining Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.

Conceptual
Overview
Text Mining Basics
Code Along Turning Texts into Numbers
Readings &
Reflection
What is Text Mining?
Case Study What do Teachers Find Most Valuable in Online PD? | R Key | Python Key
Badge Text Mining Basics Badge
Module Survey Feedback Form After Finishing Module

Module 2: Public Sentiment and School Reform (Dictionary Methods)

This module moves beyond basic concepts of text mining and takes a closer look at a dictionary-based text mining technique, sentiment analysis. Our Essential Readings examine the topic of opinion mining or sentiment analysis. This technique is very helpful for us to understand people’s opinions about things such as a policy. Our Text mining Case Study: Do the public like NGSS? is guided by the work of Rosenberg et al. (2021) and compares public sentiment expressed toward the Next Generation Science Standards (NGSS) and Common Core State Standards using X (twitter) data. Finally, the Sentiment Analysis Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.

Conceptual
Overview
Sentiment Analysis
Code Along Choosing the Right Lexicon for Sentiment Analysis
Readings &
Reflection
Dictionary-Based Methods
Case Study Do the Public Like NGSS? | R Key | Python Key
Badge Sentiment Analysis Badge
Module Survey Feedback Form After Finishing Module

Module 3: Large Language Models for Qualitative Analysis

This module wraps up our work with text mining and examines recent advances in using large language models to code qualitative data (i.e., interview transcripts, group discussions, and open-ended responses). Through our essential readings, we’ll learn about this technique. Our Text Mining Case Study: What are high school students’ machine learning literacy before and after participating in an AI curriculum? is inspired by the need to assess machine learning literacy and use automated assessment for real-time intervention in the field of AI education. Finally, the Large Language Model Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.

Conceptual
Overview
Large Language Models
Code Along In-Context Learning
Readings &
Discussion
LLM for Qualitative Analysis
Case Study What are High School Students’ Machine Learning Literacy? | Python Key |
Badge Large Language Models Badge
Module Survey Feedback Form After Finishing Module

Module 4: Topic Modeling in MOOC-Eds

This module focuses on identifying “topics” by examining how words cohere into different latent, or hidden, themes based on patterns of co-occurrence of words within documents. Our Essential Readings introduces this unsupervised machine learning technique. Our Text Mining Case Study: What are participants discussing in forums? is guided by the work from Friday Institute and it explores ideas or issues that emerged in the discussion forums in a MOOC-ed course (Akoglu, Lee, and Kellogg 2019). Finally, the Topic Modeling Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.

Conceptual
Overview
Topic Modeling 
Code Along Latent Dirichlet Allocation
Readings &
Discussion
Introduction to Topic Modeling
Case Study What are Participants Discussing in Forums? | R Key | Python Key
Badge Topic Modeling Badge
Module Survey Feedback Form After Finishing Module

Microcredential

The culminating activity for the TM Modules is designed to provide you some space for independent analysis of a self-identified data source. To earn your TM Microcredential, you must demonstrate your ability to formulate a relevant research question for text mining, effectively manage and analyze text data, and clearly communicate your key findings. Your primary goal for this analysis is to create a simple data product that illustrates key findings by applying the knowledge and skills acquired from the essential readings and case studies.

Microcredential Text Mining in Education

References

Akoglu, Kemal, Hollylynne Lee, and Shaun Kellogg. 2019. “Participating in a MOOC and Professional Learning Team: How a Blended Approach to Professional Development Makes a Difference.” Journal of Technology and Teacher Education 27 (2): 129–63.
Kellogg, Shaun, Jenifer Corn, Sherry Booth, Adrian Good, Jennifer Maxfield, Brandy Parker, Sara Pilzer, and Jennifer Tagsold. 2012. “Race to the Top Online Professional Development Evaluation.” https://www-data.fi.ncsu.edu/wp-content/uploads/2021/10/28134953/Race-to-the-Top-Online-Professional-Development-Evaluation-Year-1-Report.pdf.
Rosenberg, Joshua M, Conrad Borchers, Elizabeth B Dyer, Daniel Anderson, and Christian Fischer. 2021. “Understanding Public Sentiment about Educational Reforms: The Next Generation Science Standards on Twitter.” AERA Open 7: 23328584211024261. https://osf.io/xymsd/.