LDA topic modeling is a popular technique for unsupervised document clustering. However, the utility of LDA for analysing political communication depends on being able to interpret the topics in theoretical terms. This paper explores the relation between LDA topics and content variables traditionally used in political communication. We generate an LDA model on a full collection of front-page articles of Dutch newspapers and compare the resulting LDA topics to a manual coding of the political issues, frames, and sentiment.
In general, we find that a large number of topics are closely related to a specific issue; and that the different topics that comprise an issue can be interpreted as subissues, events, and specific journalistic framing of the issue. Linear combinations of topics are moderately accurate predictors of hand-coded issues, and at the aggregate level correlate highly. These results validate the use of LDA topics as proxies for political issues, and pave the way for a more empirical understanding of the substantive interpretation of LDA topics.
(Wouter van Atteveldt, Kasper Welbers)