mining meaning: semantic similarity and the analysis of political text

Working Paper (January 2023 version), Appendix, and Example Code

The degree of similarity in meaning between texts (e.g., manifesto items, speeches) is often of fundamental interest to political scientists. Categorizing texts based on meaning, instead of dictionary-based matching, requires solving the qualitative problem of “what goes with what.” In this note, I show how a pre-trained language model optimized for semantic textual similarity can help provide independent validation for researchers solving this problem. I introduce a new measure of discriminability – relative semantic similarity (RSS) – that captures how coherent any category of texts is in terms of its semantic meaning, relative to another category. Using the pre-trained model’s output, I show that RSS can be used as a test statistic to (1) independently validate the coding scheme of a manually categorized corpus, and (2) test for factors that might affect the distribution of semantic meaning within a corpus. RSS thus complements and extends the text analysis toolkit for social science.

persuasive effects of information in story form

In prep

With the aid of a team of undergraduate research assistants, I am currently compiling a dataset of the stories used within political science experiments in order to meta-analytically investigate the causal effect of narrative structures.  While much of the priming literature as uncovered a differential effect of information structured in "story form", the underlying mechanisms, heterogeneity of the "story effect", and consequences for belief change and effect duration are unclear.  We are also using a variety of text analytic tools to study properties of the stories used in these experiments, including their readability and narrativity.

short stories for emotion induction in neuroimaging

In prep (stimulus set available upon request)

This paper introduces a set of 64 short stories that I wrote and used to induce fear (16 stories), disgust (16 stories), and calm (32 stories) during an fMRI study. The paper provides validation of emotion induction based on human raters from MTurk, as well as story-level fMRI results. Additional features of each story, including measures of narrativity and several other latent properties of the text, are also provided. The stimulus set is standardized for ease of imagining (to avoid task demand differences), emotion discriminability, and more conventional properties such as reading time.