Topic Modeling in Education Research

TM Module 3: Essential Readings

Author

Dr. Shaun Kellogg

Published

July 13, 2025

OVERVIEW

For Module 3, our primary goal is to build a basic understanding of topic modeling and how it has been applied to gain insight into text data, and specifically its applications in educational contexts. This week’s readings and videos are intended to provide you with a brief introduction to these methods and some general applications across disciplines. Many of the introductory readings come from the Journal of Digital Humanities (Vol. 2, No. 1 Winter 2012). The entire issue on Topic Modeling contains many good reads and is worth checking out in its entirety. The Digital Humanities Contribution to Topic Modeling article provides a nice overview of the entire issue.

READINGS

To help address our discussion questions for the week, you’ll be asked to read or view 3 resources, including: 1) a required journal article, 2) an instructor-selected resource, and 3) a self-selected resource such as a journal article, video, news article, podcast, or blog post.

1. Required

  1. Topic Modeling: A Basic Introduction. Great layman’s introduction to the topic of topic modeling by Megan Brett.

2. Instructor-Selected Resources (Choose One)

The resources curated below provide a comprehensive introduction to topic modeling, covering both theoretical foundations and practical implementations. They explore the mathematical principles underlying topic models, including probabilistic approaches like Latent Dirichlet Allocation (LDA) and Structural Topic Modeling (STM), while also emphasizing the importance of text preprocessing for accurate results. Several tutorials and hands-on examples demonstrate how to apply these methods using R, offering insights into different modeling approaches and best practices. Finally, a few resources take a more critical perspective, examining the limitations of topic modeling, its application to figurative language, and the challenges of working with large-scale data.

  1. Topic modeling with R and tidy data principles. Julie Silge demonstrate how to train a topic model in R using the tidytext and stm packages on a collection of Sherlock Holmes stories.

  2. LDA Topic Models. An engaging video and introduction to the subject by Andrius Knispelis.

  3. The “Secret” Recipe for Topic Modeling Themes. Matthew Jocker’s blog post highlights the importance of preprocessing text and provides some very practical guidelines for topic modeling.

  4. Probabilistic Topic Models(opens in new window). Article by David Blei explains some of the basic concepts of topic modeling, including some underlying math and some great visuals. 

  5. Introduction to Topic Models. In this video, Duke Professor Chris Bail provides an introduction to topic modeling, including example applications and R code. 

  6. Finding structure in xkcd comics with Latent Dirichlet Allocation. Quick intro and fun example of applying to LDA to a favorite comic of mine

  7. The LDA Buffet is Now Open. Short, whimsical blog post by Matthew Jocker explaining LDA for English Marjors

  8. Topic Modeling and Figurative Language. Lisa M. Rhody explores the productive failure of topic modeling. 

  9. Training and Validating Big Models on Big Data. Video of David Mimno’s presentation provides an accessible introduction to the math behind topic modeling. 

  10. “Twitter Archeology” of Learning Analytics and Knowledge Conferences. Paper exploring the conference tweets through multiple methods including: topic modeling, and descriptive, network, and hashtag analysis.

  11. Computer-Assisted Reading and Discovery for Student Generated Text in Massive Open Online Courses. This paper introduces the Structural Topic Model with applications to self-reported students’ motivations, identifying discussion themes, and patterns of feedback in course evaluations.

  12. Unsupervised Modeling for Understanding MOOC Discussion Forums. Paper exploring three different approaches to text classification: manual coding, LDA, and the k-medoids clustering algorithm

  13. Using a Learner-Topic Model for Mining Learner Interests in Open Learning Environments. A study that applies topic modeling to automatically discover learner interests in open learning environments.

3. Self-Selected Resource

Use the NCSU Library, Google Scholar or search engine of your choice to locate a journal article, presentation, website or other scholarly resource. Your selection should discuss some form of topic modeling and address one or more of the discussion topics/questions provided below. In addition, you are welcome to find less formal resources such as videos or shorter online articles to share with the class and that help us better understand this week’s topics for discussion.

DISCUSSION

In lieu of the peer interaction and discussion of course materials that normally take place “in class”, you’ll be asked to log in this week and engage with other members of our learning community through the course discussion forum. To help guide our discussions, we will collectively address a set of guiding questions provided in each forum. You are also welcome to add your own topics or questions for the class to discuss.

With the exception of the Self-Selected resource, you are not required to post to every thread or address every question listed below, particularly if you feel others in the class have thoroughly addressed the topic or questions. Our primary goal for these discussions is to collectively build our understanding of this week’s topics through back-and-forth dialogue and avoid a “collective monologue” in which we see 20 variations of the same post.

Guiding Questions

Topic 1: What is Topic Modeling? And other new terms.

Reflecting on the course text and your self-selected reading, answer one or more of the following questions:

  1. What exactly is “topic modeling”?

  2. What is it trying to accomplish and how does it work? 

  3. What are some other new terms, words, concepts that you have come across in the resources that were unfamiliar to you, or that you had come across before but feel you have a better understanding of after this week?

Topic 2: Applications in Education

Reflecting on the course text and your self-selected reading, answer one or more of the following questions:

  1. How has topic modeling been applied to educational contexts, or in other fields that might be relevant to education? 

  2. How have/could these methods be applied to better understand and improve student learning and the contexts in which learning occurs?  

  3. How might text mining be applied in your professional context? 

  4. How has/could text mining be applied to address systemic issues or persistent problems in Education?

Topic 3: Topic Modeling Measures

Reflecting on the course text and your self-selected reading, answer one or more of the following questions:

  1. What measures or statistics associated with topic modeling were described in your readings?

  2. How are these measures interpreted and what are they used for?

Topic 4: Affordances, Limitations, & Ethical Issues

Reflecting on the course text and your self-selected reading, answer one or more of the following questions:

  1. What are some of the advantages of topic-modeling approaches compared to approaches we examined in Module 1 & 2?

  2. What are some of the issues, challenges, and limitations of this approach? 

Topic 5: Text-Based Data Sources

Reflecting on the course text and your self-selected reading, answer one or more of the following questions:

  1. What data sources were described in your readings or selections?

  2. Are some data sources more suitable or appropriate for dictionary-based methods than others?

  3. What data sources in education might be particularly suitable for dictionary-based approaches to analysis?

  4. What sources of data are you interested in potentially exploring for an independent analysis in this Module or for a final course project?

Student-Selected Resources

Provide a brief overview of your self-selected resource that includes the following:

  • APA Citation (note: this can be easily retrieved via Google Scholar)

  • What was the purpose of your article?

  • How was Topic Modeling defined and/or characterized?

  • What data source(s) were analyzed or discussed?

  • How, if at all, did your article touch upon the application(s) of text mining to “understand and improve learning and the contexts in which learning occurs?”

  • What were some key findings from the analysis?

  • Did your selection address any ethical or legal considerations of text mining?

ASSESSMENT EXAMPLE

Grading

Grading for this week is fairly lenient, provided that it’s fairly clear from your posts that you’ve done the required reading. Readings and discussion for each module are worth 6 points and judged based on three criteria: quantity, quality, and connections to readings.

In term of quantity (2 points), you’ll be expected to add at least 4 posts over the course of the week and spread across at least two different days. Your initial post should be shared by Friday to help facilitate discussion.

In terms of quality (2 points), your posts over the next week should provide new or insightful contributions to the division questions or topics (see Gao’s productive online discussion model summarized below). There is no requisite for the length of each posting; in fact short conversational exchanges (1-3 paragraphs) are highly encouraged.

In terms of connections (2 points), your collective posts should help us interpret or elaborate on discussion topics, questions, or ideas other have shared by “making connection to the learning materials” as illustrated in Gao’s Disposition 1: Discussion to Comprehend. Your posts should tie in to at least 3 different resources.

Productive Online Discussion Model

Disposition 1: Discuss to Comprehend

Actively engage in such cognitive processes as interpretation, elaboration, making connections to prior knowledge.

  • Interpreting or elaborating the ideas by making connection to the learning materials
  • Interpreting or elaborating the ideas by making connection to personal experience
  • Interpreting or elaborating the ideas by making connection to other ideas, sources, or references

Disposition 2: Discuss to Critique

Carefully examine other people’s views, and be sensitive and analytical to conflicting views.

  • Building or adding new insights or ideas to others’ posts
  • Challenging ideas in the texts
  • Challenging ideas in others’ posts

Disposition 3: Discuss to Construct Knowledge

Actively negotiate meanings, and be ready to reconsider, refine and sometimes revise their thinking.

  • Comparing views from the texts or others’ posts
  • Facilitating thinking and discussions by raising questions
  • Refining and revising one’s own view based on the texts or others’ posts

Disposition 4: Discuss to Share Improved Understanding

Actively synthesize knowledge and explicitly express improved understanding based on a review of previous discussions.

  • Summarizing personal learning experiences of online discussions
  • Synthesizing content of discussion
  • Generating new topics based on a review of previous discussions