Abstract
The paper introduces the project of the Index Thomisticus Treebank (IT-TB). The IT-TB is a dependency-based treebank based on the corpus of the Index Thomisticus by father Roberto Busa (IT), which includes the opera omnia of Thomas Aquinas, for a total of approximately 11 million words. Currently, the IT-TB is the largest Latin treebank available, with more than 350,000 nodes in around 17,000 sentences. The annotation covers the entire books 1, 2 and 3 of Summa contra Gentiles, plus excerpts from Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae. The paper details the multi-layer annotation style of the IT-TB and its background theoretical motivations. The conversion process to the now widely used Universal Dependencies style is described as well. Across more than a decade, the proj- ect has developed a number of linguistic resources and NLP tools for Latin connected to the IT-TB. As for the resources, the paper presents the syntax- based subcategorization lexicon IT-VaLex and the valency lexicon Latin Vallex. As for the tools, the automatic dependency parsing process is de- scribed, highlighting the core issue of portability of NLP tools across the wide diachronic and diatopic span of Latin texts. A section is dedicated to auto- matic morphological analysis of Latin, introducing the analyzer Lemlat and its recent enhancement with information on derivational morphology and a new set of lexical entries covering a large Onomasticon (from Forcellini dic- tionary) and Medieval Latin (from Du Cange glossary).
| Lingua originale | Inglese |
|---|---|
| Titolo della pubblicazione ospite | Digital Classical Philology. Ancient Greek and Latin in the Digital Revolution |
| Editore | de Gruyter |
| Pagine | 299-319 |
| Numero di pagine | 21 |
| Volume | 10 |
| ISBN (stampa) | 978-3-11-059678-6 |
| DOI | |
| Stato di pubblicazione | Pubblicato - 2019 |
All Science Journal Classification (ASJC) codes
- Discipline Umanistiche Generali
- Informatica Generale
- Scienze Sociali Generali
Keywords
- Index Thomisticus
- Latin
- Syntax
- Treebank
Fingerprint
Entra nei temi di ricerca di 'The Project of the Index Thomisticus Treebank'. Insieme formano una fingerprint unica.Cita questo
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver