Join Together? Combining Data to Parse Italian Texts

Claudia Corbetta*, Giovanni Moretti, Marco Carlo Passarotti

*Autore corrispondente per questo lavoro

Risultato della ricerca: Contributo in libroContributo a conferenza

Abstract

In this paper, we create and evaluate non-combined and combined models using Old and Contemporary Italian data to determine whether increasing the size of the training data with a combined model could improve parsing accuracy to facilitate manual annotation. We find that, despite the increased size of the training data, in-domain parsing performs better. Additionally, we discover that models trained on Old Italian data perform better on Contemporary Italian data than the reverse. We attempt to explain this result in terms of syntactic complexity, finding that Old Italian text exhibits higher sentence length and non-projectivity rate.
Lingua originaleInglese
Titolo della pubblicazione ospiteProceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)
EditoreCEUR Workshop Proceedings
Pagine251-257
Numero di pagine7
ISBN (stampa)979-12-210-7060-6
Stato di pubblicazionePubblicato - 2024

All Science Journal Classification (ASJC) codes

  • Informatica Generale

Keywords

  • Old Italian
  • Syntactic Parsing

Fingerprint

Entra nei temi di ricerca di 'Join Together? Combining Data to Parse Italian Texts'. Insieme formano una fingerprint unica.

Cita questo