On Friday December 13, the “Patterns in Narrative Texts” symposium took place at Meertens Institute in Amsterdam. This symposium was part of the series of meetings organised by different projects within the Continuous Access to Cultural Heritage framework funded by NWO.

Narratives play a crucial role in the NewsReader project as they provide the framework for connecting the events, news stories and entities in our domain. We were thus very interested in seeing what our colleagues are working on.

The host, the FACT project, is trying to populate a folktale database automatically as manual input is just too time consuming. They employ techniques from IR to extract keywords to help index their data. What is interesting is that visualisations such as word clouds of these keywords can already give quite a good overview of a domain and the topics that are interesting in this domain.

Mike Kestemont from Antwerp University and Folgert Karsdorp from Meertens Institute also presented work on automatic analysis of TIME magazine, which has a searchable archive since 1923. One of the things they did was trying to predict this year’s TIME person of the year, unfortunately, they didn’t quite succeed, but it does present an interesting challenge which is ranking a set of ~5000 entities and you’re actually only interested in No1. For NWR, this may provide an interesting way of looking at some of the data points we want to single out, i.e. the ‘interesting needles in the haystack’.

The keynote speaker of the day, prof. dr. Timothy R. Tangherlini, is an expert on Scandinavian folktales and he presented research on a big data set of folk tales that were collected around Denmark. Through clustering stories by word usage or georeferencing stories by their source, he uncovered some really interesting links between different stories, that aid him in coming up with research questions he said he could never have though of solving before. I think his talk once again showed how important it is for experts to make information available in many different (visual) ways in order for the expert to be able to get an overview of the corpus as well as drill down in the data.

We came home with a whole bunch of fresh ideas, papers to read and tools to look into, it’s going to be an exciting winter for NWR.