Hong Kong Summer School: Advanced Text Analysis with R

I’m very excited to be teaching the course on Advanced Text Analysis with R at the Hong Kong  as part of the City University of Hong Kong Summer School in Social Science Research. I will use this page to publish lecture slides, hand-outs, data sets etc.

As the title indicates, the course will be taught almost completely using R. If you don’t use R yet, please make sure that you install R and Rstudio on your laptop. Also, please go through the code on the first two handouts published on my learningR page:

  1. R as a calculator
  2. Playing with data in R 

In general, all slides (including source code) are available from github vanatteveldt/hk2016, and all handouts are available from vanatteveldt/learningr

If you have any questions, please don’t hesitate to email me at wouter@vanatteveldt.com. Thanks, and see you all in Hong Kong!


June 2nd (morning): Organizing and Transforming data in R

In this introductory session you will learn how to use R to organize and transform your data: calculating columns, subsetting, transforming and merging data, and computing aggregate statistics. If time permits, we will also cover basic modelling and/or programming in R as desired.

June 2nd (afternoon): Visualizing and using APIs from R: Twitter, Facebook, NY Times
In this session we will look briefly at visualizing data in R. The main focus of the session is on using APIs from R. We will be looking at the Twitter, Facebook, and NY Times API, and also see how to access arbitrary web resources from R.
You will also start working on your mini-projects by selecting a topic and gathering data.

June 3d (morning):  Querying text with AmCAT and R
This is the first session that directly deals with text analysis.
The goal of this session is to learn how to use AmCAT as a document management tool, upload data, and perform queries from R.
You will continue working on your topic by uploading your data and conducting exploratory analyses.

June 3d (afternoon): Corpus Analysis and Text (pre)processing
 
In this session the focus is on the Document Term Matrix: word clouds,  comparison of different corpora, and topic models.

June 4th: Advanced text analysis: Machine learning and sentiment analysis
In this session we will do sentiment analysis using both a dictionary approach and with machine learning. These techniques can also be applied to other forms of automatic content analysis such as determining topic or frame analysis.

June 5th: Advanced text analysis: Semantic Network Analysis and Visualization
In the last session we will look at semantic network analysis with word-window approaches and more advanced visualization techniques using ggplot2, igraph, and gephi.