Lab 4 Readings: Topic Modeling

Author

LASER Institute

Published

July 20, 2024

Overview

Our primary goal is to build a basic understanding of topic modeling and how it has been applied to gain insight into text data, and specifically its applications in educational contexts. Many of the introductory readings come from the Journal of Digital Humanities (Vol. 2, No. 1 Winter 2012). The entire issue on Topic Modeling contains many good reads and is worth checking out in its entirety. The Digital Humanities Contribution to Topic Modeling article provides a nice overview of the entire issue.

Readings

Required
  1. Topic Modeling: A Basic Introduction. Great layman’s introduction to the topic of topic modeling by Megan Brett.

  2. LDA Topic Models. An engaging video and introduction to the subject by Andrius Knispelis.

  3. The “Secret” Recipe for Topic Modeling Themes. Matthew Jocker’s blog post highlights the importance of preprocessing text and provides some very practical guidelines for topic modeling. 

Choose One (or more if interested)
  1. Probabilistic Topic Models. Article by David Blei explains some of the basic concepts of topic modeling, including some underlying math and some great visuals. 

  2. Introduction to Topic Models. In this video, Duke Professor Chris Bail provides an introduction to topic modeling, including example applications and R code. 

  3. Finding structure in xkcd comics with Latent Dirichlet Allocation. Quick intro and fun example of applying to LDA to a favorite comic of mine

  4. The LDA Buffet is Now Open. Short, whimsical blog post by Matthew Jocker explaining LDA for English Marjors

  5. Topic Modeling and Figurative Language. Lisa M. Rhody explores the productive failure of topic modeling. 

  6. Training and Validating Big Models on Big Data. Video of David Mimno’s presentation provides an accessible introduction to the math behind topic modeling. 

  7. “Twitter Archeology” of Learning Analytics and Knowledge Conferences. Paper exploring the conference tweets through multiple methods including: topic modeling, and descriptive, network, and hashtag analysis.

  8. Computer-Assisted Reading and Discovery for Student Generated Text in Massive Open Online Courses. This paper introduces the Structural Topic Model with applications to self-reported students’ motivations, identifying discussion themes, and patterns of feedback in course evaluations.

  9. Unsupervised Modeling for Understanding MOOC Discussion Forums. Paper exploring three different approaches to text classification: manual coding, LDA, and the k-medoids clustering algorithm

  10. Using a Learner-Topic Model for Mining Learner Interests in Open Learning Environments. A study that applies topic modeling to automatically discover learner interests in open learning environments.