Intro to Quantitative Enthnography

Text Mining Module 4: A Conceptual Overview

Overview of Quantitative Ethnography (QE)

Guiding questions:

What is QE?
Why should we use QE? When is it most appropriate to use?
What are common techniques for QE?

Text Mining at Scale

Having access to massive educational text data via chats, forums, reflections, etc. presents a dilemma:
- Qualitative methods are deep but don’t scale.
- Natural language processing such as topic modeling (e.g., LDA) scales well, but loses “context”

What is Quantitative Ethnography (QE)?

Definition: A research paradigm integrating statistical power with “thick description,” the combination of behaviors and the context for those behaviors.
Combines structure + meaning.
Why it matters: QE validates big-data findings with human-coded insights.

Introduction to Epistemic Network Analysis (ENA)

ENA is an analysis technique that maps relationships between concepts.
It focuses on co-occurrence within a specific “window,” or stanza, of text, rather than total frequencies of a given term.
For example: If “pedagogy” and “technology” appear in the same stanza of a predefined length, ENA maps that link. If it continues to find that link over many stanzas, the strength of the link increases between those ideas.
Other parameters can be set to account for dialogues, events, or other structures within an observation: (e.g., “units,” “conversations,” etc.)

Scenario: Middle School Earth Sciences Class

3 groups of 4-5 students apiece, tasked with discussing a sustainable urban redevelopment problem.
Research Question: How do high-performing groups connect scientific evidence to policy decisions compared to lower-performing groups?

Wrangle

First, record & transcribe group discussions.
Retain identities of each student and which group so individual and group ENAs can be compared.

Exploring

Develop a coding scheme based on your observations. For example:
- [environmental.issues] (e.g., “This would contribute to runoff and erosion.”)
- [zoning.codes] (e.g., “Could we build that next to the train station?”)
- [social.issues] (e.g., “That looks like hostile architecture in the plaza.”)
- [scientific.thinking] (e.g., “The chart shows a 2-degree increase.”)

Model

ENA looks for your coded data within the stanza, an arbitrated “window” of text made of n lines.
“The key idea behind a stanza is that (a) codes in lines anywhere within the same stanza are related to one another in the model, and (b) codes in lines that are not in the same stanza are not related to one another in the model” ((Shaffer and Ruis 2017)
This window “moves” up and down the transcript to find co-occurrences and build the model.

Communicate

Discussion

What are some similarities between the two students’ epistemic networks? What are some differences?
What happens when you have to compare 10 epistemic networks? 20? 100?

Using ENA with R

R is able to conduct ENA through the rENA package!

install.packages("rENA")
library(rENA)

Acknowledgements

This work was supported by the National Science Foundation grants DRL-2025090 and DRL-2321128 (ECR:BCSER). Any opinions, findings, and conclusions expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Shaffer, David Williamson, and A. R. Ruis. 2017. Epistemic Network Analysis: A Worked Example of Theory-Based Learning Analytics. Society for Learning Analytics Research (SoLAR). https://doi.org/10.18608/hla17.015.