Abstract
In this paper, we create and evaluate non-combined and combined models using Old and Contemporary Italian data to determine whether increasing the size of the training data with a combined model could improve parsing accuracy to facilitate manual annotation. We find that, despite the increased size of the training data, in-domain parsing performs better. Additionally, we discover that models trained on Old Italian data perform better on Contemporary Italian data than the reverse. We attempt to explain this result in terms of syntactic complexity, finding that Old Italian text exhibits higher sentence length and non-projectivity rate.
Lingua originale | Inglese |
---|---|
Titolo della pubblicazione ospite | Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024) |
Editore | CEUR Workshop Proceedings |
Pagine | 251-257 |
Numero di pagine | 7 |
ISBN (stampa) | 979-12-210-7060-6 |
Stato di pubblicazione | Pubblicato - 2024 |
All Science Journal Classification (ASJC) codes
- Informatica Generale
Keywords
- Old Italian
- Syntactic Parsing