This Monday I will be presenting a paper at the Big Data workshop organized by Philp Habel and Sarah Birch at the university of Glasgow.
LDA models topics… But what are ‘topics’?
The relation between LDA topics and traditional measures of issue, frame, and valence
Wouter van Atteveldt, Kasper Welbers, Carina Jacobi, Rens Vliegenthart
LDA topic modeling is a popular technique for unsupervised document clus-
tering. However, the utility of LDA for analysing political communication
depends on being able to interpret the topics in theoretical terms. This
paper explores the relation between LDA topics and content variables tra-
ditionally used in political communication. We generate an LDA model on
a full collection of front-page articles of Dutch newspapers and compare the
resulting LDA topics to a manual coding of the political issues, frames, and
In general, we find that a large number of topics are closely related to a
specific issue; and that the different topics that comprise an issue can be
interpreted as subissues, events, and specific journalistic framing of the issue.
The relation between frames and topics is less direct, with a large amount
of topics associated with each of the investigated frames while no topics
were identified that really encoded just a specific frame. Finally, hardly any
topic had a clear sentiment associated, with only exception for topics whose
sentiment is contained in the represented issue, such as disasters. These
results validate the use of LDA topics as proxies for political issues, and pave
the way for a more empirical understanding of the substantive intepretation
of LDA topics.