AmCAT API howto’s for Python and R (and an extra workshop)

Inspired by the interest generated at the workshops last weeks., I’ve written a number of howto documents for working with the AmCAT API:

  • Python scraping: A demo for a scraper using the amcat API that scrapes the (creative commons licensed) wikinews site (thanks to Paul Huygen)
  • Python analysis: A simple demo script that donwloads the (scraped) articles and counts all words.
  • R querying: A howto for using R to query AmCAT and retrieve metadata
  • R vocabulary and topic modeling: A howto for downloading term-document matrices, comparing them (to find typical vocabulary or collocates), and doing topic modeling

There will also be an extra workshop on using grammatical analysis in AmCAT that will be held on 30th of April. Please let me know if you plan to attend (and haven’t already told me).

Posted in Uncategorized | Leave a comment

AmCAT workshops@VU: 9 April (beginner), 16 April (advanced)

I will be giving two workshops for AmCAT users in April at the VU Amsterdam. The workshops are open to anyone interested, but please mail me if you want to attend either or both workshops.

Using AmCAT: April 9, 13:30 – 16:30, Location Metropolitan Z-009

On the 9th of April, I will help everyone who wants to get started with using AmCAT3. This workshop is aimed at scientists from the social sciences or humanities who want to get started with using digital text analysis methods. Existing users who have used AmCAT2 will also be interested to learn what is new and changed in this version.

Download Slides | Download hands-on exercises

Topics will depend on audience demand, but include:

  • What is AmCAT? Can I use AmCAT?
  • Getting started: project managements, adding your texts to AmCAT, users
  • Automatic coding: using keyword queries, improving queries, estimating validity, using codebooks.
  • Manual coding: designing codebooks and coding schemas, coding, extracting results
  • Hands-on session. Please bring a laptop if you want to participate in the hands-on part of the workshop!

Advanced AmCAT: April 16, 13:30 – 16:30, Location Metropolitan Z-007

Download slides

This workshop is made for people who are interested in the more technologically advanced capabilites of AmCAT. People who are interested in using AmCAT in conjunction with R of python will also be interested. No specific technical knowledge is required to understand the workshop, but some experience with R will help a lot. Please bring a laptop if you plan to attend this session (I can provide one if needed)

Topics will depend on audience demand, but include:

  • The AmCAT API: how to access your data from python/R
  • Scraping / uploading articles from other data sources
  • Topic modeling and machine learning on AmCAT data
  • Extracting quotes and statements: Grammatical analysis and graph transformations
Posted in Uncategorized | Leave a comment

ISA and MPSA

I’m now in Toronto to attend ISA and will go to MPSA afterwards. Interestingly, both conferences had ‘big data’ panels that I will be presenting in:

ISA: Semantic Network Analysis of Frame Building during war: Mediated Public Diplomacy in Gaza, Georgia, and Iraq (presentation)

This paper is a work-in-progress describing an ongoing effort to automatically analyze the framing of conflict by media in third countries using Semantic Network Analysis. We study three conflicts: the 2003–2011 war in Iraq, the 2008 South Ossetian conflict, and the 2008–2009 Gaza War. For each conflict, we have manually analysed (public or private) messages of at least one of the beligerent parties to determine that party’s prefered framing of the conflict. By analysing these frames from a semantic network perspective, we show that there is a recurrent set of framing functions that are used by the parties in all three conflicts. Using transformation rules on the syntactic structure of sentences, these framing functions can then be automatically identified in newspaper coverage. Once these rules are finalized and evaluated properly, they will allow us to automatically study framing building in international conflict in an automatic and transparant way, while retaining the rich semantics required by framing analysis.

MPSA: Quotes as Data:  Extracting Political Statements from Dutch Newspapers by applying Transformation Rules to Syntax Graphs (presentation)

To understand the relation between media and politics, it is necessary to study the content of politicians’ statements in the news. By using syntactic analysis and topic models, this paper looks at how often politicians are quoted, and whether their media statements are similar to their statements in parliament. While media attention simply follows political power, this is quite different for media statements. The frequency of statements is a matter of journalistic demand (e.g. high during scandals) and political supply (e.g. low during closed-door negotiations). Media statements are most similar to
political discourse during the campaign, and for limited-issue parties. Some interesting results were found, with the anti-immigration PVV being rela tively dissimilar during the campaign, and possible coalition partners being relatively dissimilar during the coaltion talks. This paper is a promising first step into the relatively understudied area of mediated politics.

Posted in Uncategorized | Leave a comment

AmCAT 3.3 released

Yesterday evening we released AmCAT 3.3. We are quite excited about this as we think that it is an important step towards making AmCAT more usable and more stable. Below this mail you can find a summary of improvement, the most obvious of which will be the completely restructured UI, which we believe gives a cleaner and more modern look.
Although we have tested this version extensively while developing, this upgrade adds quite a number of features and has a completely refactored system for generating the website (reflected in the new UI), so it is quite possible that there are still some bugs. Please report any bugs or feature requests on the new issue tracker at https://github.com/amcat/amcat/issues.

The ‘amcatbook’ has also been updated and is available from https://www.dropbox.com/s/nnkhzlcuhza9f69/amcatbook.pdf (in dutch). The manual can be accessed as before at http://amcat.vu.nl/news/index.php/for-users/, it has not been updated to the newest version but most things should be quite similar.

For an up-to-date list of immediate bugs that we are aware of, please see the issue tracker and especially the page for milestone 3.3.01: https://github.com/amcat/amcat/issues?milestone=2&page=1&state=open. Feel free to browse to the other milestones as well, 3.3.1 are the first plans for improvements to the new version, while milestone 3.4 are the more long-term plans, first and foremost the query screen (https://github.com/amcat/amcat/issues/18 and https://github.com/amcat/amcat/issues/17).

Thanks for using AmCAT and thanks as always for reporting bugs and suggestions for improvement!

“The AmCAT team”

Improvements in 3.3:

  • Completely overhauled navigator UI. The new UI is a lot more standardized across pages and has less clutter, making it easier to use especially for new users.
  • Complete rewrite of the annotator which greatly improved performance when using large and/or many codebooks. Keyboard shortcuts and supported browsers have also changed, so please consult the “help” link in the annotator before coding. You can also indicate which part of a sentence you are coding if ‘subsentence’ is selected in the coding schema.
  • Queries and some actions are now run in the ‘background’, and the website displays a progress dialog. This makes the server less likely to become too busy and gives the user an indication of what is happening.
  • If you click on an article after querying, the matches for the query will be highlighted in the article text.
  • Authentication (rights and permissions) are now handled better than before. It is still quite possible that users are allowed to do things they shouldn’t, but most cases should be handled now. If you don’t see a button are you get a permission denied, please check whether you have sufficient access to the project you are working in. If there are any permissions-related problems, please open an issue as normal.
  • Codebook handling. You can now export codebooks and all labels to excel and various formats, and import them from csv. You can also update an existing codebook with new labels or structure from a csv file.
  • Improved export. All tables now have excel and SPSS export. Aggregations and Associations now have correct field type for SPSS.
  • Performance improvements. Performance of complicated queries has improved a lot (this was backported to production around Christmas). Summary now makes a single call to get both total #hits and the top 10 hits.
  • Minor usability improvements, such as opening articles in a new page from the query screen, association interval and other options have been revamped, plain text uploader has a “text” field to bypass uploading a text, scripts and uploaders have better help text, …
  • API improvements, especially token based authentication and full support for query search and aggregate. See also https://github.com/amcat/amcat-r and http://amcat.nl/R/amcatr.pdf

Plans for 3.4

  • The biggest priority is the query screen. It is confusing that some options are only accessible once you have performed a specific query (https://github.com/amcat/amcat/issues/18). It is also annoying that changing e.g. association settings requires first asking a new summary. We are also eager to add new functionality to the query screen, such as word clouds, new visualizations, links with coding jobs, etc.
  • Storing more state, especially storing queries and offering a list of recent queries, recent projects, etc. (https://github.com/amcat/amcat/issues/16https://github.com/amcat/amcat/issues/17)
  • Improving the API. There is a new ‘hierarchical’ API in place (i.e. where an articleset is located under that project at api/v4/projects/X/articlesets/Y) which also allows creation/modification. In 3.4 this new API will replace the old API, meaning that all old resources should be present in the new system and that security should be checked thoroughly, especially as the API also allows anonymous access. (https://github.com/amcat/amcat/issues/15)

If you have any suggestions for 3.4, please mail us or create an issue!

Posted in Uncategorized | Leave a comment

AmCAT workshop at HUJI

Monday 15 July and Wednesday 17 July Christian Baden and I will give a workshop on Automatic Coding and Semantic Network Analysis using AmCAT.

Monday, 15 July 2013

09:30 – 09:45 Introduction
09:45 – 10:30 Quantitative analysis of discourse
10:30 – 11:30 Automatic & manual coding using AmCAT
Coffee Break
12:00 – 13:00 Analytic opportunities within the AmCAT framework
13:00 – 13:30 Examples & Applications
Lunch Break
14:30 – 15:30 Hands-on session (Computer Lab, optional)

Wednesday, 17 July 2013

09:30 – 11:15 Vocabulary, Grammar, and Semantic Networks
Coffee Break
11:45 – 13:30 Context, Patterns, and Associative Coherence
Lunch Break
14:30 – 15:30 Hands-on session (Computer Lab, optional) (download files, download rstudio)

Also see the ‘for users‘ and ‘for developers‘ pages for more links and resources.

Posted in Uncategorized | Leave a comment

AmCAT 3.2.2 released

We released AmCAT 3.2.2 yesterday night. This is a minor release with some urgent fixes and a few added features.

One important change is that the development server, amcat-dev, now has its own database that is copied from the production server every night. This reduces the chance that a problem in the development version of the software can corrupt the production database, and it allows us to test database changes on the full production database without affecting normal AmCAT use. This has an important consequence for users:  Changes made on the development server will not be visible on the main server and these changes will be reverted every night. You can still use query or export functions that are under development as the data is still available. You can also use the development site to test changes, e.g. uploading articles or changing coding schemas, before doing it “live”.

New features:

  • ‘Show Associations’ can now create object-object tables and network graphs again
  • You can now import and export codebooks from/to a csv file (and hence excel)
  • You can now upload zip files containing multiple files to an article set
  • We made the coding schema editor more intuitive and easier to use
  • Added ‘delete’ buttons to most resources like schemas, codebooks etc.

Fixes:

  • Fixed the bug where running webscripts on the search page using the radio buttons (i.e. the second time you run the same script) would give an error
  • Fixed the problem causing very long load times in the codebook editor for large codebooks
  • … and various smaller fixes

As always, report any bugs you might encounter or suggestions on the issue tracker.

This summer we will be working hard on AmCAT 3.3, so stay tuned!

Posted in Uncategorized | Leave a comment

AmCAT 3.2 released

On Sunday the production server at amcat.vu.nl was migrated to the version 3.2. The focus for this version was on usability, especially for larger projects.

What’s new in 3.2?

Check the issue tracker for the full list of features added and issues resolved for 3.2. Here is a short summary:

  • Favourites. Projects and article sets can now be marked as “favourite” using the star icons in the projects and article set lists. By default, only your favourites will be listed in these screens, with an option to select other projects or sets. This is intended to remove the amount of clutter in larger projects.
  • Set filtering for queries. In the query screen, you can now filter the listed articles sets to show only favourites, only fully indexed sets, or only coding job sets. This should make it easier to select and combine the sets you want to analyse.
  • Coding job extraction. The way to extract coding jobs was completely overhauled. In the new version, the coding jobs overview screen in a project has an export button, which allows you to select one or more jobs for export, and specify the fields and field options you want to export.
  • Project  buttons. A number of buttons were added to the project details and article set details screens to make it easier to do common actions such as querying a specific set or importing it to another project. The article set screen also has a new button to draw a random sample from an existing set.
  • Preprocessing. A new tab in the project screen accesses the linguistic preprocessing features. This is still a work in progress, but in the current version you can assign sets to be preprocessed by parsers such as Alpino and Stanford CoreNLP, and check parse progress.
  • Various bug fixes and layout changes. Especially the tabs and actions buttons were changed over to jquery bootstrap for a clearer and more uniform look and feel.

Log on to AmCAT or create a new account to check out the new version!

Plans for 3.3

So what are the plans for 3.3? We need to do some work on the internals of AmCAT, especially to move to class-based views and to check the permissions system, which is probably too liberal at the moment.

We also want to overhaul the query tab, to make it more intuitive for first time users and to make it possible to filter and tabulate on all possible fields, not just the ones we thought were useful. Also, we want to make it easy to add and access new analyses in the query tab through the plugin system.

The third focus for 3.3 will be the API. AmCAT has a REST API to allow power users to access all data in AmCAT without having to use the web interface, for example directly from R or python. We want to make some changes and additions to this API, and then make sure it is stable so people can rely on their API based scripts continuing to function. One of the big additions we want to make is to allow data change through the API as well, so you can create for example write an R script to do a keyword query, draw a stratified sample from the results, and create a new codingjob based on the outcome; or combine the results of a keyword search with existing codingjobs and create a new set based on a specific analysis. Such actions currently require a number of export / import steps, so it will be a lot easier if everything can be done through the API.

We hope to release 3.3 around September 2013. Stay tuned!

 

Posted in Uncategorized | Leave a comment