Text Mining

The transition to digital learning has made available new sources of data, providing researchers new opportunities for understanding and improving STEM learning. Data sources such as digital learning environments and administrative data systems, as well as data produced by social media websites and the mass digitization of academic and practitioner publications, hold enormous potential to address a range of pressing problems in STEM Education, but collecting and analyzing text-based data also presents unique challenges. Text mining labs address the following critical questions:

What kinds of text data are valuable?
How can we quantify text data?
What kinds of research questions could be addressed with text data?
What opportunities and challenges do large language models bring to the field of mining STEM education data?
How can we set up a research agenda that drives innovations in STEM education research with text data?

	Github	Repository for Instructors
	Posit Cloud	Workspace for Learners

Module 1: Tidy Text & Word Counts (TM Basics)

This module is a gentle introduction to getting our text “tidy” so we can perform some basic word counts, look at words that occur at a higher rate in a group of documents, examine words that are unique to those document groups, and create visualizations such as word cloud. The focus of our Essential Readings and case study in this lab is to help LASER Scholars gain a general understanding of key text mining concepts and terminology, as well as develop a basic comfort level with quantifying text data and working with text data. Our Text Mining Case Study: What aspects of online professional development offerings do teachers find most valuable? is guided by the work from Friday Institute and it examines teachers’ experiences in professional development (Kellogg et al. 2012). Finally, the Intro to Text Mining Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.

	Conceptual Overview	Text Mining Basics
	Code Along	Tidy Text, Tokenization & Term Frequency
	Readings & Reflection	What is Text Mining?
	Case Study	What do Teachers Find Most Valuable in Online PD? \| Answer Key
	Badge	Text Mining Basics Badge
	Module Survey	Feedback Form After Finishing Module

Module 2: Public Sentiment and School Reform (Dictionary Methods)

This module moves beyond basic concepts of text mining and takes a closer look at a dictionary-based text mining technique, sentiment analysis. Our Essential Readings examine the topic of opinion mining or sentiment analysis. This technique is very helpful for us to understand people’s opinions about things such as a policy. Our Text mining Case Study: Do the public like NGSS? is guided by the work of Rosenberg et al. (2021) and compares public sentiment expressed toward the Next Generation Science Standards (NGSS) and Common Core State Standards using X (twitter) data. Finally, the Sentiment Analysis Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.

	Conceptual Overview	Intro to Dictionary-Based Methods
	Code Along	Working with Sentiment Lexicons
	Readings & Reflection	Dictionary-Based Methods
	Case Study	Public Sentiment Towards State Standards? \| Answer Key
	Badge	Sentiment Analysis Badge
	Module Survey	Feedback Form After Finishing Module

Module 3: Topic Modeling in MOOC-Eds

This module focuses on identifying “topics” by quantifying how words cohere into different latent, or hidden, themes based on patterns of co-occurrence of words within documents. Our Essential Readings introduces this unsupervised machine learning technique, while our Module 3 Case Study is guided by the work from Friday Institute and explores ideas or issues that emerged in the discussion forums in a MOOC-ed course (Akoglu, Lee, and Kellogg 2019). Finally, the Topic Modeling Badge provides an opportunity to create your own data product and to reflect on how these concepts and techniques might apply to your own research.

	Conceptual Overview	An Intro to Topic Modeling
	Code Along	Latent Dirichlet Allocation
	Readings & Discussion	Topic Modeling and STEM Ed Research
	Case Study	Topic Modeling in MOOC-Eds \| Answer Key
	Badge	Topic Modeling Badge
	Module Survey	Feedback Form After Finishing Module

Module 4: Quantitative Enthnography

Our final module concludes our exploration of text mining methods by introducing Quantitative Ethnography (QE) as well as Epistemic Network Analysis (ENA). ENA is a subset of QE and focuses on modeling the relationships between ideas within discourse. While automated topic modeling approaches such as Latent Dirichlet Allocation (LDA) are useful for identifying general themes in large text corpora, they do not capture the structure of meaning, that is, how ideas are connected within a conversation, a learning environment, or a collaborative task. This module will explore how QE and ENA fill this gap by providing a framework to analyze how concepts are linked in human discourse and how those connections reflect knowledge construction and problem-solving.

	Conceptual Overview	Intro to Quantitative Enthnography
	Code Along	The rENA Package
	Readings & Discussion	Quantitative Ethnography and ENA
	Case Study	ENA and Virtual Internship Conversations \| Answer Key
	Badge	Epistemic Network Analysis Badge
	Module Survey	Feedback Form After Finishing Module

Microcredential

The culminating activity for the TM Modules is designed to provide you some space for independent analysis of a self-identified data source. To earn your TM Microcredential, you must demonstrate your ability to formulate a relevant research question for text mining, effectively manage and analyze text data, and clearly communicate your key findings. Your primary goal for this analysis is to create a simple data product that illustrates key findings by applying the knowledge and skills acquired from the essential readings and case studies.

Microcredential

Text Mining in Education

References

Akoglu, Kemal, Hollylynne Lee, and Shaun Kellogg. 2019. “Participating in a MOOC and Professional Learning Team: How a Blended Approach to Professional Development Makes a Difference.” Journal of Technology and Teacher Education 27 (2): 129–63.

Kellogg, Shaun, Jenifer Corn, Sherry Booth, Adrian Good, Jennifer Maxfield, Brandy Parker, Sara Pilzer, and Jennifer Tagsold. 2012. “Race to the Top Online Professional Development Evaluation.” https://www-data.fi.ncsu.edu/wp-content/uploads/2021/10/28134953/Race-to-the-Top-Online-Professional-Development-Evaluation-Year-1-Report.pdf.

Rosenberg, Joshua M, Conrad Borchers, Elizabeth B Dyer, Daniel Anderson, and Christian Fischer. 2021. “Understanding Public Sentiment about Educational Reforms: The Next Generation Science Standards on Twitter.” AERA Open 7: 23328584211024261. https://osf.io/xymsd/.