Paper: Using syntactic clauses for social and semantic network analysis
Abstract: This article presents a new method and open source R package that uses syntactic information to automatically extract source–subject–predicate clauses. This improves on frequency based text analysis methods by dividing text into predicates with an identied subject and optional source. The content of the identified predicates can be analysed by existing frequency based methods, showing what different actors are described as doing and saying. We show that a small set of syntactic patterns can extract clauses and identify quotes with good accuracy, significantly outperforming a baseline system based on word order. Taking the 2008–2009 Gaza war as an example, we further show how corpus comparison and semantic network analysis can be applied to the results of the clause analysis to analyse the difference in citation and framing patterns between U.S. and English-language Chinese coverage of this war. [paper under review, mail me if interested] [presentation]
Workshop: Text and Network Analysis with R (hosted on github):
This is the material for the Text and Network Analysis with R course as part of the Networks in a Global World (NetGloW) conference. I will use this page to publish slides, hand-outs, data sets etc. As the title indicates, the workshop will be taught almost completely using R. If you don’t use R yet, please make sure that you install R and Rstudio on your laptop.
This repository hosts the slides (html and source code). The source code for all handouts is published on my learningR page. You may also want to check out a 4 day Text Analysis with R course I taught at City U, Hong Kong.
In this introductory session you will learn how to use R to organize and transform your data and how to obtain data by accessing public APIs such as from Twitter, facebook etc.
- Handout: Accessing APIs
- For those new(ish) to R, these handouts may also be interesting:
This session is the main content of the workshop: analysing text and networks from R. We will look at simple corpus analysis, comparing corpora, topic modeling, analysing (social) networks, and semantic network analysis.