CLIN 2015: Using Syntactic Clauses for Social and Semantic Network Analysis

Links: [download presentation][R source code]

After almost 10 years I’m giving a talk at CLIN (Computational Linguistics in the Netherlands) again. I completely rewrote the clause code from python to R, which is quite exciting as it will make it much easier to tweak and add rules “client-side”, see I also did a new validation, comparing the results to a new gold standard of manually coded aggressive actions in the 2009 Gaza war. I also compare the results to a “word order co-occurrence” baseline that assumes that the leftmost actor is the agent (subject). Results show convincingly that word-order is indeed very fragile in conflict situations:

Method Precision Recall F1-Score
Syntactic rules .70 .72 .71
Word-order baseline .36 .35 .35

I also re-evaluated the source extraction, where I compare to a baseline that uses the same speech verbs, and assumes that an actor left of the speech verb is the source, and right of the speech verb the quote. Evaluation shows that recall is the same for both methods (which miss more ‘subtle’ ways of expressing quotes), but precision is extremely good for the syntactic method while being mediocre for the baseline:

Method Precision Recall F1-Score
Syntactic rules .95 .61 .74
Word-order baseline .5 .62 .55

In my presentation I will be presenting these results as well as a number of substantive results related to the different bias of Chinese and American newspaper coverage of the 2009 Gaza war. Results show that Chinese quote Hamas much more frequently and also display Hamas less as an aggressor.

More visually, the following shows side-by-side the actions of Israel according to the US and Chinese media, where you can clearly see that US focuses on aggression towards Hamas and emphasises the reasons for the attack (goal discourse), while China focuses on the more civilian Gaza and emphasises the attacks itself (means discourse).

(Israeli actions, Left: US newspapers; right: Chinese newspapers. Network shows co-occurrence based semantic network of all words in predicates with Israel as subject that are overrepresented in the respective country)

Links: [download presentation][R source code]