The MEANTIME Corpus (the NewsReader Multilingual Event ANd TIME Corpus) consists of a total of 480 news articles: 120 English Wikinews (http://en.wikinews.org/) articles on four topics (i.e. Airbus and Boeing, Apple Inc., Stock market, and General Motors, Chrysler and Ford) and their translations in Spanish, Italian, and Dutch.
It has been annotated manually at multiple levels, including entities, events, temporal information, semantic roles, and intra-document and cross-document event and entity coreference.
The NewsReader MEANTIME corpus is licensed under a Creative Commons Attribution 4.0 International License.
If you use this corpus, please cite the following paper:
Anne-Lyse Minard, Manuela Speranza, Ruben Urizar, Begona Altuna, Marieke van Erp, Anneleen Schoen, and Chantal van Son. 2016. MEANTIME, the NewsReader Multilingual Event and Time Corpus. In Proceedings of LREC 2016. TO APPEAR.
Manually annotated data (version 1.0)
- MEANTIME – English section: meantime_newsreader_english.zip
- MEANTIME – Spanish section: meantime_newsreader_spanish.zip
- MEANTIME – Dutch section: meantime_newsreader_dutch.zip
- MEANTIME – Italian section: will be used as test data for the task FactA at Evalita 2016 and will be made publicly available in October 2016
- MEANTIME – agreement data: agreement-MEANTIME-corpus.zip
Raw texts (NAF format)
- English articles: meantime_newsreader_english_raw_NAF.zip
- Spanish articles: meantime_newsreader_spanish_raw_NAF.zip
- Italian articles: will be available in October 2016
- Dutch articles: meantime_newsreader_dutch_raw_NAF.zip
The English section has been used as trial and evaluation data for the Task “TimeLine: Cross-Document Event Ordering” at SemEval 2015.
In this context timelines have been created from the annotated articles.
For more information please visit the task’s website: http://alt.qcri.org/semeval2015/task4/.
The Dutch section of the MEANTIME corpus has been used for the CLIN26 Shared Task, the first collocated Shared Task for Dutch.
For more information please visit the task’s website: http://wordpress.let.vupr.nl/clin26/shared-task/.
- Sara Tonelli, Rachele Sprugnoli, Manuela Speranza and Anne-Lyse Minard (2014) NewsReader Guidelines for Annotation at Document Level. NWR-2014-2-2. Version FINAL (Aug 2014). Fondazione Bruno Kessler.
- Manuela Speranza, Rubén Urizar and Anne-Lyse Minard. NewsReader Italian and Spanish specific Guidelines for Annotation at Document Level. NWR-2014-6. DRAFT version. Fondazione Bruno Kessler.
- Anneleen Schoen, Chantal van Son, Marieke van Erp and Hennie van der Vliet. NewsReader Document-Level Annotation Guidelines – Dutch. NWR-2014-08. VU University Amsterdam.
- Manuela Speranza and Anne-Lyse Minard. Cross-Document Annotation Guidelines. NWR-2014-9. Fondazione Bruno Kessler.
- Anne-Lyse Minard, Manuela Speranza, Ruben Urizar, Begona Altuna, Marieke van Erp, Anneleen Schoen, and Chantal van Son. 2016. MEANTIME, the NewsReader Multilingual Event and Time Corpus. In Proceedings of LREC 2016. TO APPEAR.
- Manuela Speranza and Anne-Lyse Minard. Cross-language projection of multilayer semantic annotation in the NewsReader Wikinews Italian corpus (WItaC). In Proceedings of the Second Italian Conference on Computational Linguistics (CLiC-it 2015). Proceedings of CLiC-it
- Anne-Lyse Minard, Manuela Speranza, Eneko Agirre, Itziar Aldabe, Marieke van Erp, Bernardo Magnini, German Rigau and Ruben Urizar. SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). http://www.aclweb.org/anthology/S15-2132
- Marieke van Erp, Piek Vossen, Rodrigo Agerri, Anne-Lyse Minard, Manuela Speranza, Ruben Urizar, Egoitz Laparra, Itziar Aldabe, and German Rigau. 2015. Annotated Data, version 2. Technical Report D3-3-2, VU Amsterdam. http://www.newsreader-project.eu/files/2012/12/NWR-D3-3-2.pdf.