R is a very powerful and flexible statistics package and programming language.
This repository contains a number of ‘howto’ files aimed to providing an introduction to R and some of its possibilities.
Some other great sites for learning R are:
- OpenIntro statistics with a number of good statistics ‘labs’ in R
- Quick-R with explanations and sample code for a wide array of applications
- Advanced R Programming for (much) more information on what is really going on.
To install R and RStudio, please see lab 0 of the OpenIntro statistics book.
- Getting started
- Data preparation
- Data analysis
- Advanced modeling
Dealing with textual data
For textual data, we have also developed two R packages to communicate with the AmCAT text analysis framework and to deal with corpus analysis and topic models. We also wrote two relevant howto’s:
- Corpus Analysis: Term document Matrices, frequency analysis, and topic modeling (source)
- Claues Analysis: Using grammatical analysis for semantic network analysis (source)
Below are also some handouts that do not depend on AmCAT, based on a Dutch data set:
- Corpus Analysis: Term Document Matrices (source)
- LDA topic modeling (source)
- Lemmatization (source)
- Machine Learning with RTextTools (source)