Mondag 20 feb: Research talk @cityu

Don’t you like it? Using CrowdSourcing for Sentiment Analysis of Dutch and English (political) text  

Wouter van Atteveldt, Antske Fokkens, Isa Maks, Kevin van Veenen, and Mariken van der Velden

[Download slides]

Sentiment Analysis is an important technique for many aspects of communication research, with applications from social media analysis and online reviews to negativity in political communication. The subjective and context-specific nature of evaluative language, however, makes it particularly challenging to develop and validate good sentiment analysis tools.

We use crowdsourcing to develop a tool for classifying sentiment expressed in a text about a specific target. Crowdsourcing is especially useful for sentiment analysis because of the subjective nature of the judgment, and the low cost makes it possible to code items multiple times. By comparing crowdsourcing with dictionary analysis and expert coding, we can show the most cost-effective way to conduct accurate targeted sentiment analysis.

cityu_seminar

CfP: CMM Special Issue on Computational Methods

CT&M’s journal Communication Methods and Measures invites submissions for a special issue on computational methods. Here is the full call for papers:

For this special issue, we invite submissions that further the understanding, development and application of computational methods in communication research.Computational methods include (but are not limited to) methods such as text analysis, topic modeling, social/semantic network analysis, online experiments, machine learning, and agent-based modeling and simulations. Computational Methods can be used to build theory about, quantify, analyze, and visualize communication structures and processes. Computational methods can be applied to “big data” and social media data, but can also be used to analyse historical archives (e.g. newspaper archives, proceedings) or to provide a more sophisticated understanding of “small data”.

In particular, we welcome submissions on:

  • Innovative ways to use computational methods for communication research;
  • Evaluation and validation of computational approaches to studying communication research;
  • Application of computational methods to answer substantive communication research questions;
  • Reflections on the role of computational methods in communication research and their link with theory;

The special issue may also include a “teacher’s corner” article with brief descriptions of useful software packages and tools for studying communication. Authors interested in this format are encouraged to contact special issue co-editor Wouter van Atteveldt prior to submission.

The deadline for submission for consideration is July 1, 2017. Submitters should include a statement in the cover letter that the manuscript is being submitted for the special issue on Computational Methods. Articles will be peer reviewed and a decision rendered within 60 days, with a target publication date of March 2018. Instructions for authors and a description of the online submission process can be found on the journal’s home page at

http://www.tandf.co.uk/journals/HCMS

Questions about this special issue can be directed to Wouter van Atteveldt or Winson Peng, Guest Editors, at wouter@vanatteveldt.com and pengtaiq@msu.edu

 

Visiting Hong Kong

As you can have guessed from my new header image, I am currently in Hong Kong as a visiting assistant professor at City University Hong Kong. Besides hiking, I am looking forward to continuing my research and especially to experimenting with some Chinese (and Cantonese!) text processing.

I will also give a research talk here on the 20th of February on Sentiment Analysis, stay tuned!

Text Analysis in R @Glasgow

I will be giving a workshop on Text Analysis in R at Glasgow University on 17 November, 2016.

Data: [all data (zip)][tokens.rds][meta.rds][lexicon.rds][reviews.rds]

Data as csv: [tokens_full.csv][tokens.csv][lexicon.csv]

[Source for all slides (contains the R code)]

Schedule:

10:30 – 12:00 [slides][session log]
– Recap: Frequency Based Analysis and the DTM
– Dictionary Analysis with AmCAT and R

13:30 – 15:00 [slides]
– Simple Natural Language Processing
– Corpus Analysis and Visualization
– Topic Modeling and Visualization

15:15 – 17:00 [slides]
– Sentiment Analysis with dictionaries
– Sentiment Analysis with proximity
– [Handout: Obtaining sentiment resources with R]
– If time permits: [machine learning sentiment analysis handout]

Useful links:

Current Issues in communication science: Big Data and Social Analytics

I will give a guest lecture today in the ‘Current Issues in Communication Science’ course of our MSc in Communication Science. The leading question of the lecture is how big data will impact the social sciences: what are the opportunities and pitfalls? Using the famous ‘facebook studies’ as an example, I will show how ‘big data’ can be used to answer theoretically relevant questions that would otherwise be impossible to answer, but also stress the problems and dangers of relying on such data. [Download slides]

Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D., Marlow, C., Settle, J. E., & Fowler, J. H. (2012). A 61-million-person experiment in social influence and political mobilization. Nature, 489(7415), 295-298.

Kramer, A. D., Guillory, J. E., & Hancock, J. T. (2014). Experimental evidence of massive-scale emotional contagion through social networks. Proceedings of the National Academy of Sciences, 111(24), 8788-8790.

Clause Analysis accepted for Political Analysis

I’m delighted that my paper on clause analysis has been accepted for publication in political analysis:

Clause analysis: Using syntactic information to automatically extract source, subject, and predicate from texts with an application to the 2008-2009 Gaza War
Wouter van Atteveldt Tamir Sheafer, Shaul R. Shenhav, and Yair Fogel-Dror

Abstract: This article presents a new method and open source R package that uses syntactic information to automatically extract source–subject–predicate clauses. This improves on frequency based text analysis methods by dividing text into predicates with an identified subject and optional source, extracting the statements and actions of (political) actors as mentioned in the text. The content of these predicates can be analyzed using existing frequency based methods, allowing for the analysis of actions, issue positions and framing by different actors within a single text. We show that a small set of syntactic patterns can extract clauses and identify quotes with good accuracy, significantly outperforming a baseline system based on word order. Taking the 2008–2009 Gaza war as an example, we further show how corpus comparison and semantic network analysis applied to the results of the clause analysis can show differences in citation and framing patterns between U.S. and English-language Chinese coverage of this war.

You an download the [presentation] I gave based on the paper at NetGlow

NetGloW 2016: Networks in a Global World

Paper: Using syntactic clauses for social and semantic network analysis

Abstract: This article presents a new method and open source R package that uses syntactic information to automatically extract source–subject–predicate clauses. This improves on frequency based text analysis methods by dividing text into predicates with an identied subject and optional source. The content of the identified predicates can be analysed by existing frequency based methods, showing what different actors are described as doing and saying.  We show that a small set of syntactic patterns can extract clauses and identify quotes with good accuracy, significantly outperforming a baseline system based on word order. Taking the 2008–2009 Gaza war as an example, we further show how corpus comparison and semantic network analysis can be applied to the results of the clause analysis to analyse the difference in citation and framing patterns between U.S. and English-language Chinese coverage of this war. [paper under review, mail me if interested] [presentation]

Workshop: Text and Network Analysis with R (hosted on github):

This is the material for the Text and Network Analysis with R course as part of the Networks in a Global World (NetGloW) conference. I will use this page to publish slides, hand-outs, data sets etc. As the title indicates, the workshop will be taught almost completely using R. If you don’t use R yet, please make sure that you install R and Rstudio on your laptop.

This repository hosts the slides (html and source code). The source code for all handouts is published on my learningR page. You may also want to check out a 4 day Text Analysis with R course I taught at City U, Hong Kong.

Session 1: Managing data and Accessing APIs from R

In this introductory session you will learn how to use R to organize and transform your data and how to obtain data by accessing public APIs such as from Twitter, facebook etc.

Session 2: Corpus and Network Analysis

This session is the main content of the workshop: analysing text and networks from R. We will look at simple corpus analysis, comparing corpora, topic modeling, analysing (social) networks, and semantic network analysis.

Hong Kong Summer School: Advanced Text Analysis with R

I’m very excited to be teaching the course on Advanced Text Analysis with R at the Hong Kong  as part of the City University of Hong Kong Summer School in Social Science Research. I will use this page to publish lecture slides, hand-outs, data sets etc.

As the title indicates, the course will be taught almost completely using R. If you don’t use R yet, please make sure that you install R and Rstudio on your laptop. Also, please go through the code on the first two handouts published on my learningR page:

  1. R as a calculator
  2. Playing with data in R 

In general, all slides (including source code) are available from github vanatteveldt/hk2016, and all handouts are available from vanatteveldt/learningr

If you have any questions, please don’t hesitate to email me at wouter@vanatteveldt.com. Thanks, and see you all in Hong Kong!


June 2nd (morning): Organizing and Transforming data in R

In this introductory session you will learn how to use R to organize and transform your data: calculating columns, subsetting, transforming and merging data, and computing aggregate statistics. If time permits, we will also cover basic modelling and/or programming in R as desired.

June 2nd (afternoon): Visualizing and using APIs from R: Twitter, Facebook, NY Times
In this session we will look briefly at visualizing data in R. The main focus of the session is on using APIs from R. We will be looking at the Twitter, Facebook, and NY Times API, and also see how to access arbitrary web resources from R.
You will also start working on your mini-projects by selecting a topic and gathering data.

June 3d (morning):  Querying text with AmCAT and R
This is the first session that directly deals with text analysis.
The goal of this session is to learn how to use AmCAT as a document management tool, upload data, and perform queries from R.
You will continue working on your topic by uploading your data and conducting exploratory analyses.

June 3d (afternoon): Corpus Analysis and Text (pre)processing
 
In this session the focus is on the Document Term Matrix: word clouds,  comparison of different corpora, and topic models.

June 4th: Advanced text analysis: Machine learning and sentiment analysis
In this session we will do sentiment analysis using both a dictionary approach and with machine learning. These techniques can also be applied to other forms of automatic content analysis such as determining topic or frame analysis.

June 5th: Advanced text analysis: Semantic Network Analysis and Visualization
In the last session we will look at semantic network analysis with word-window approaches and more advanced visualization techniques using ggplot2, igraph, and gephi.