New project: Rethinking News Algorithms

Great news from NWO to start the holiday season: Our proposal to study news algorithms and news diversity (together with Antske Fokkens and Natali Helberger) was funded!

Rethinking news algorithms: nudging users towards diverse news exposure: We improve news algorithms to stimulate people to read more diverse news. Algorithms such as used by Facebook and Google can unwittingly trap people in a “filter bubble”. Nudging people to read about more topics and perspectives makes them more aware of issues facing the country.

U Mannheim Computational Analysis of Political Communication

The explosion of digital communication and increasing efforts to digitize existing material has produced a deluge of material such as digitized historical news archives, policy and legal documents, political debates and millions of social media messages by politicians, journalists, and citizens. This has the potential of putting theoretical predictions about the societal roles played by information, and the development and effects of communi­cation to rigorous quantitative tests that were impossible before. Besides providing an opportunity, the analysis of such “big data” sources also poses methodological challenges. Traditional manual content analysis does not scale to very large data sets due to high cost and complexity. For this reason, many researchers turn to automatic text analysis using techniques such as dictionary analysis, automatic clustering and scaling of latent traits, and machine learning.

To properly use such techniques, however, requires a very specific skill set. This course aims to give students a basic introduction to text analysis and computational thinking. R will be used as platform and language of instruction, but the basic principles and methods are easily generalizable to other languages and tools such as python.

Course Content

Day 1:introduction to computational thinking and R
(11.11.2019 10:15 – 18:45, 406 Seminarraum; B 6, 30-32 Bauteil E-F)

Preparation:

  • read Van Atteveldt & Peng (2018) and R for Data Science chapters 1 and 2
    (freely available at https://r4ds.had.co.nz)
  • Install R, RStudio and the packages tidyverse and quanteda on your laptop.

Morning session:

Afternoon session: Data analysis with Tidyverse

 

Day 2: Automatic Quantitative Text analysis
(18.11.2019 10:15 – 18:45, 406 Seminarraum; B 6, 30-32 Bauteil E-F)

Preparation:

 

Morning session:

Afternoon session:

Day 3: Language Processing and Validity
(
22.11.2019, 10:15 – 18:45, 406 Seminarraum; B 6, 30-32 Bauteil E-F)

Preparation:

  • read Grimmer & Stewart (2013)
  • finish practicals if needed;
  • Finalize research proposal and exploration for research project

Morning session:

Afternoon session:

Assignment:

FSS Summer Festival Workshop: R in Pairs

Kasper Welbers and I will give a workshop on “R in Pairs” at the VU FSS Summer Festival [VU login required] this afternoon: R may sound scary, but it can actually be fun and collaborative. You will team up in pairs to play with R together to gather and visualize data from Canvas or other sources. You’ll be amazed at what you can learn and do in 40 minutes when there is free chocolate!

Have a look at the Workshop instructions, and also have a look at our other R material. Enjoy!

Computational Communication Research: Inaugural Issue

We are very happy to announce the inaugural issue for Computational Communication Research! The articles are currently in production, but you can access the preprints using the links below.  Please help us spread the word!

We would like to thank all reviewers, submitters, and editorial board members for contributing to the journal and for their feedback on this introduction. We would also like to thank Amsterdam University Press and especially our gold sponsors (Vrije Universiteit Amsterdam, The Network Institute, the University of Amsterdam / ASCoR) and silver sponsors (The Hebrew University of Jerusalem, The Center for Information Technology and Society at UC Santa Barbara, and the Computational Communication Science Lab of the University of Vienna), for making this journal possible.

Looking forward to your submissions and reviews in the coming months!
CCR Inaugural Issue

Introduction: A Roadmap for Computational Communication Research [draft version]
Wouter van Atteveldt, Drew Margolin, Cuihua Shen, Damian Trilling, René Weber
[https://osf.io/preprints/socarxiv/4dhfk]

GDELT Interface for Communication Research (iCoRe)
Frederic R. Hopp, Jacob T. Fisher, René Weber
[https://osf.io/24n6a/]

An Experimental Study of Recommendation Algorithms for Tailored Health Communication
Hyun Suk Kim, Sijia Yang, Minji Kim, Brett Hemenway, Lyle Ungar, Joseph Cappella:
[https://osf.io/preprints/socarxiv/nu6tg]

News Organizations’ Selective Link Sharing as Gatekeeping: A Structural Topic Model Approach
Chankyung Pak
[https://osf.io/preprints/socarxiv/pt7es]

Computational observation: Challenges and opportunities of automated observation within algorithmically curated media environments using a browser plug-in
Mario Haim, Angela Nienierza
[https://osf.io/preprints/socarxiv/xd63n/]

Research talk University of Zurich

This afternoon I will give a research talk at the Institute of Communication and Media Research at the University of Zurich.

Building the Open Computational Communication Science toolchain

Computational Communication Science promises to give new insight into communication and social behavior by using digital methods to study large and heterogeneous data sets consisting of traces left by online activity from Instagram posts, comments to online news articles on various sites to online purchases. 
This talk focuses on the tools needed to carry out this research. In particular, we need tools to gather data, such as digital trace data; analyze the resulting texts, networks, and images to  measure our theoretical quantities; and store and share the data and results. In all cases, it is important to focus on the replicability, validity, and transparency of data, analytic processes, and results. In this talk, I will outline the requirements, existing resources and challenges for “open” Computational Communication Science.  For each of these steps, I will discuss the possibilities and limitations of existing tools, and describe the methods and open sources tools that we are currently developing. I will call for a turn to “open science” and collaboration on open source software to build the tools we need to develop Computational Communication Science.

Research talk in Vienna

This afternoon I will give a talk at the Research Colloquium of the Institut für Publizistik- und Kommunikationswissenschaft of the University of Vienna

Title: Building the Open Computational Communication Science toolchain

Abstract: Computational Communication Science promises to give new insight into communication and social behavior by using digital methods to study large and heterogeneous data sets consisting of traces left by online activity from Instagram posts, comments to online news articles on various sites to online purchases.
This talk focuses on the tools needed to carry out this research. In particular, we need tools to gather data, such as digital trace data; analyze the resulting texts, networks, and images to  measure our theoretical quantities; and store and share the data and results. In all cases, it is important to focus on the replicability, validity, and transparency of data, analytic processes, and results. In this talk, I will outline the requirements, existing resources and challenges for “open” Computational Communication Science.  For each of these steps, I will discuss the possibilities and limitations of existing tools, and describe the methods and open sources tools that we are currently developing. I will call for a turn to “open science” and collaboration on open source software to build the tools we need to develop Computational Communication Science.

Text Analysis in R workshop at U. Vienna

As part of my Paul Lazarsfeld Guest Professorship I will teach a workshop on text analysis in R at the University of Vienna from 8 – 12 April.

For participants: Please bring your own laptop and make sure you have R and RStudio installed.

Introduction: The explosion of digital communication and increasing efforts to digitize existing material has produced a deluge of material such as digitized historical news archives, policy and legal documents, political debates or millions of social media messages by politicians, journalists, and citizens. This has the potential of putting theoretical predictions about the societal roles played by information, and the development and effects of communication to rigorous quantitative tests that were impossible before. Besides providing an opportunity, the analysis of such “big data” sources also poses methodological challenges. Traditional manual content analysis does not scale to very large data sets due to high cost and complexity. For this reason, many researchers turn to automatic text analysis using techniques such as dictionary analysis, automatic clustering and scaling of latent traits, and machine learning.

Course aims and structure: To properly use such techniques, however, requires a very specific skillset. This course aims to give interested PhD (and advanced Master) students an introduction to text analysis. R will be used as platform and language of instruction, but the basic principles and methods are easily generalizable to other languages and tools such as python. Participants will be given handouts with examples based on pre-existing data to follow along, but are encouraged to work on their own data and problems using the techniques offered.

Evaluation criteria: Evaluation will be based on two assignments:

  1. (30%) midweek data exercise
    1. Deadline: Wednesday (soft)
    2. Instructions
    3. Data
    4. Submission link
  2. (70%) final assignment  on a topic of your choice
    1. Deadline: Friday 19 April
    2. Instructions
    3. Submission link

There’s also a Optional/formative quiz to test your tidyverse skills

Material: The course mostly uses the handouts linked below per session. The source code of the handouts is available on Github. Also see the rstudio cheat sheets and the excellent book R for Data Science.

Course outline per day (A=morning, B=afternoon):

  1. Monday: Introduction to R
    1. (  9:00-11:00)
      1. R Basics: data and functions (practise template);
      2. Fun with Text
    2. (14:00-16:00)
      1. Tidyverse: Transforming  data;
      2. reading and importing data (external tutorial)
  2. Tuesday: R for data analysis
    1. (  9:00-11:15)
      1. Grouping and summarizing data
      2. Merging (joining) data sets
    2. (13:30-16:00)
      1. Visualizing data with ggplot
      2. Reshaping data: wide, long, and tidy
  3. Wednesday: Quantitative text analysis in R
    1. (9:00-13:00)
      1. Basic string handling in R [session log – warning, might be messy!]
      2. Reading, cleaning, and analysing text with quanteda and readtext [messy session log]
  4. Thursday: Topic Modeling  and Preprocessing
    1. (  9:00-12:00)
      1. Topic Modeling [slides] [handout]
        Optional handouts: [graphical interpretation] [perplexity code]
      2. NLP Preprocessing [slides] [handout]
    2. (14:00-16:00)
      1. Understanding topic modeling (slides)
        optional links: [gibbs sampling in R][understanding alpha]
      2. Structural Topic Model [slides] [handout] [vignette]
  5. Friday: Supervised machine learning
    1. (  9:00-12:00) Supervised text classification [slides] [handout]
    2. (14:00-16:00) Work on assignment

Course Literature:

Kasper Welbers, Wouter van Atteveldt, and Ken Benoit (2017), Text Analysis in R. Communication Methods and Measures, 11 (4), 245-265, doi:10.1080/19312458.2017.1387238

Wickham, H., & Grolemund, G. (2016). R for data science: import, tidy, transform, visualize, and model data. O’Reilly Media, Inc. .

Background literature:
– Wouter van Atteveldt and Tai-Quan Peng (2018), When Communication Meets
Computation: Opportunities, Challenges, and Pitfalls in Computational Communication
Science, Communication Methods and Measures 12 (2-3), pp. 81-92.
– Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Advances in neural information
processing systems (pp. 288-296).
– Denny, M. J., & Spirling, A. (2018). Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Political Analysis, 26(2), 168-189.
– Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 21(3), 267-297.
– Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder‐Luis, J., Gadarian, S. K., … & Rand, D. G. (2014). Structural Topic Models for Open‐Ended Survey Responses. American Journal of Political Science, 58(4), 1064-1082.
– Young, L., & Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29(2), 205-231.

–  Goldberg, Y. (2017). Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies10(1), 1-309. If you google for neural network methods for natural language processing pdf you might be able to find the evaluation sample from the publisher.

 

Etmaal 2019: Mobile tracking and crowd coding

My presentations for Etmaal (Dutch-Flemish communication science conference) 2019:

Gathering Mobile News Consumption Traces: An Overview of Possibilities and a Prototype Tool (not really based on Google Takeout)
Wouter Van Atteveldt, Laurens Bogaardt, Vincent van Hees, Felicia Loecherbach, Judith Moeller and Damian Trilling

Download [Poster][Slides]


Sentiment Analysis: what is great and what sucks?
Wouter van Atteveldt, Mariken van der Velden, Mark Boukes

Download [Slides]