ICA 2017: Crowd sourcing for sentiment analysis

(Wouter van Atteveldt, Mariken van der Velden, Antske Fokkens)

Download slides

Due to the need for context-specific sentiment analysis tools and the rich language used for expressing sentiment in political text, automatic sentiment analysis suffers heavily from the scarcity of annotated sentiment data. This is especially true for directional sentiment, i.e. annotations that a holder has sentiment about a specific target.

In this paper we use crowdsourcing to overcome this data scarcity problem and develop a tool for classifying sentiment expressed in a text about a specific target. Crowdsourcing is especially useful for sentiment analysis because sentiment coding is a simple but essentially subjective judgment, and the low cost of crowdsourcing makes it possible to code items multiple times, showing the spread of sentiment as well as the point estimate.

We show that crowd sourcing can work to get directed sentiment with reasonable accuracy with as little as 2-3 coders per unit, increasing in accuracy up to 10 coders. By selecting sentences on which coders agree a very high-precision subset of codes can be compiled. It is essential to make the task as simple as possible and to have good ‘gold questions’ for quality control.

Our future plans are to gather data on sentiment about specific political parties from Dutch and English tweets and political news. These data are used to compare crowdsourcing to manual expert coding. Moreover, these data will be used to enhance an existing sentiment dictionary and to train a machine learning model. By comparing the outcome of these various approaches, we can show the most cost-effective way to conduct accurate targeted sentiment analysis.