I’m delighted that my paper on clause analysis has been accepted for publication in political analysis. See below for the published article, and see the github page for the rsyntax software.
Clause analysis: Using syntactic information to automatically extract source, subject, and predicate from texts with an application to the 2008-2009 Gaza War
Wouter van Atteveldt Tamir Sheafer, Shaul R. Shenhav, and Yair Fogel-Dror
Abstract: This article presents a new method and open source R package that uses syntactic information to automatically extract source–subject–predicate clauses. This improves on frequency based text analysis methods by dividing text into predicates with an identified subject and optional source, extracting the statements and actions of (political) actors as mentioned in the text. The content of these predicates can be analyzed using existing frequency based methods, allowing for the analysis of actions, issue positions and framing by different actors within a single text. We show that a small set of syntactic patterns can extract clauses and identify quotes with good accuracy, significantly outperforming a baseline system based on word order. Taking the 2008–2009 Gaza war as an example, we further show how corpus comparison and semantic network analysis applied to the results of the clause analysis can show differences in citation and framing patterns between U.S. and English-language Chinese coverage of this war.