Abstract
We present an overview of the Index Thomisticus Treebank project (IT-TB). The ITTB
consists of around 60,000 tokens from the Index Thomisticus by Roberto Busa SJ, an 11-
million-token Latin corpus of the texts by Thomas Aquinas. We briefly describe the annotation
guidelines, shared with the Latin Dependency Treebank (LDT). The application of data-driven
dependency parsers on IT-TB and LDT data is reported on. We present training and parsing
results on several datasets and provide evaluation of learning algorithms and techniques.
Furthermore, we introduce the IT-TB valency lexicon extracted from the treebank. We report
on quantitative data of the lexicon and provide some statistical measures on subcategorisation
structures.
Lingua originale | English |
---|---|
pagine (da-a) | 103-127 |
Numero di pagine | 25 |
Rivista | REVUE TAL |
Volume | 50(2) |
Stato di pubblicazione | Pubblicato - 2009 |
Keywords
- latino
- lessico
- linguistica computazionale
- parsing
- treebank
- valenza