Text Mining
The transition to digital learning has made available new sources of data, providing researchers new opportunities for understanding and improving STEM learning. Data sources such as digital learning environments and administrative data systems, as well as data produced by social media websites and the mass digitization of academic and practitioner publications, hold enormous potential to address a range of pressing problems in STEM Education, but collecting and analyzing text-based data also presents unique challenges. Text mining labs address the following critical questions:
- What kinds of text data are valuable?
- How can we quantify text data?
- What kinds of research questions could be addressed with text data?
- What opportunities and challenges do large language models bring to the field of mining STEM education data?
- How can we set up a research agenda that drives innovations in STEM education research with text data?
![]() |
Github |
Repository for Instructors |
![]() |
Posit Cloud | Workspace for Learners |
Module 1: Tidy Text & Word Counts (TM Basics)
This module is a gentle introduction to getting our text “tidy” so we can perform some basic word counts, look at words that occur at a higher rate in a group of documents, examine words that are unique to those document groups, and create visualizations such as word cloud. The focus of our Essential Readings and case study in this lab is to help LASER Scholars gain a general understanding of key text mining concepts and terminology, as well as develop a basic comfort level with quantifying text data and working with text data. Our Text Mining Case Study: What aspects of online professional development offerings do teachers find most valuable? is guided by the work from Friday Institute and it examines teachers’ experiences in professional development (Kellogg et al. 2012). Finally, the Intro to Text Mining Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.
![]() |
Conceptual Overview |
Text Mining Basics |
![]() |
Code Along | Tidy Text, Tokenization & Term Frequency |
![]() |
Readings & Reflection |
What is Text Mining? |
![]() |
Case Study | What do Teachers Find Most Valuable in Online PD? | Answer Key |
![]() |
Badge | Text Mining Basics Badge |
![]() |
Module Survey | Feedback Form After Finishing Module |
Module 2: Public Sentiment and School Reform (Dictionary Methods)
This module moves beyond basic concepts of text mining and takes a closer look at a dictionary-based text mining technique, sentiment analysis. Our Essential Readings examine the topic of opinion mining or sentiment analysis. This technique is very helpful for us to understand people’s opinions about things such as a policy. Our Text mining Case Study: Do the public like NGSS? is guided by the work of Rosenberg et al. (2021) and compares public sentiment expressed toward the Next Generation Science Standards (NGSS) and Common Core State Standards using X (twitter) data. Finally, the Sentiment Analysis Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.
![]() |
Conceptual Overview |
Intro to Dictionary-Based Methods |
![]() |
Code Along | Working with Sentiment Lexicons |
![]() |
Readings & Reflection |
Dictionary-Based Methods |
![]() |
Case Study | Public Sentiment Towards State Standards? | Answer Key |
![]() |
Badge | Sentiment Analysis Badge |
![]() |
Module Survey | Feedback Form After Finishing Module |
Module 3: Topic Modeling in MOOC-Eds
This module focuses on identifying “topics” by quantifying how words cohere into different latent, or hidden, themes based on patterns of co-occurrence of words within documents. Our Essential Readings introduces this unsupervised machine learning technique, while our Module 3 Case Study is guided by the work from Friday Institute and explores ideas or issues that emerged in the discussion forums in a MOOC-ed course (Akoglu, Lee, and Kellogg 2019). Finally, the Topic Modeling Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.
![]() |
Conceptual Overview |
An Intro to Topic Modeling |
![]() |
Code Along | Latent Dirichlet Allocation |
![]() |
Readings & Discussion |
Topic Modeling and STEM Ed Research |
![]() |
Case Study | Topic Modeling in MOOC-Eds | Answer Key |
![]() |
Badge | Topic Modeling Badge |
![]() |
Module Survey | Feedback Form After Finishing Module |
Module 4: Quantitative Enthnography
Our final module concludes our exploration of text mining methods by introducing Quantitative Ethnography (QE) as well as Epistemic Network Analysis (ENA). ENA is a subset of QE and focuses on modeling the relationships between ideas within discourse. While automated topic modeling approaches such as Latent Dirichlet Allocation (LDA) are useful for identifying general themes in large text corpora, they do not capture the structure of meaning, that is, how ideas are connected within a conversation, a learning environment, or a collaborative task. This module will explore how QE and ENA fill this gap by providing a framework to analyze how concepts are linked in human discourse and how those connections reflect knowledge construction and problem-solving.
![]() |
Conceptual Overview |
Intro to Quantitative Enthnography |
![]() |
Code Along | The rENA Package |
![]() |
Readings & Discussion |
Quantitative Ethnography and ENA |
![]() |
Case Study | ENA and Virtual Internship Conversations | Answer Key |
![]() |
Badge | Epistemic Network Analysis Badge |
![]() |
Module Survey | Feedback Form After Finishing Module |
Microcredential
The culminating activity for the TM Modules is designed to provide you some space for independent analysis of a self-identified data source. To earn your TM Microcredential, you must demonstrate your ability to formulate a relevant research question for text mining, effectively manage and analyze text data, and clearly communicate your key findings. Your primary goal for this analysis is to create a simple data product that illustrates key findings by applying the knowledge and skills acquired from the essential readings and case studies.
![]() |
Microcredential | Text Mining in Education |