DaDoEval @ EVALITA 2020: Same-genre and cross-genre dating of historical documents

Rachele Sprugnoli, Stefano Menini, Sara Tonelli

Risultato della ricerca: Contributo in libroContributo a convegno

Abstract

In this paper we introduce the DaDoEval shared task at EVALITA 2020, aimed at automatically assigning temporal information to documents written in Italian. The evaluation exercise comprises three levels of temporal granularity, from coarse-grained to year-based, and includes two types of test sets, either having the same genre of the training set, or a different one. More specifically, DaDoEval deals with the corpus of Alcide De Gasperi's documents, providing both public documents and letters as test sets. Two systems participated in the competition, achieving results always above the baseline in all subtasks. As expected, coarse-grained classification into five periods is rather easy to perform automatically, while the year-based one is still an unsolved problem also due to the lack of enough training data for some years. Results showed also that, although De Gasperi's letters in our test set were written in standard Italian and in a style which was not too colloquial, cross-genre classification yields remarkably lower results than the same-genre setting.
Lingua originaleEnglish
Titolo della pubblicazione ospiteProceedings of the Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020)
Pagine391-397
Numero di pagine7
Stato di pubblicazionePubblicato - 2020
Evento7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. EVALITA 2020 - Online
Durata: 17 dic 202017 dic 2020

Workshop

Workshop7th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. EVALITA 2020
CittàOnline
Periodo17/12/2017/12/20

Keywords

  • Natural Language Processing
  • computational linguistics
  • evaluation

Fingerprint Entra nei temi di ricerca di 'DaDoEval @ EVALITA 2020: Same-genre and cross-genre dating of historical documents'. Insieme formano una fingerprint unica.

Cita questo