The paper introduces the project of the Index Thomisticus Treebank (IT-TB). The IT-TB is a dependency-based treebank based on the corpus of the Index Thomisticus by father Roberto Busa (IT), which includes the opera omnia of Thomas Aquinas, for a total of approximately 11 million words. Currently, the IT-TB is the largest Latin treebank available, with more than 350,000 nodes in around 17,000 sentences. The annotation covers the entire books 1, 2 and 3 of Summa contra Gentiles, plus excerpts from Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae. The paper details the multi-layer annotation style of the IT-TB and its background theoretical motivations. The conversion process to the now widely used Universal Dependencies style is described as well. Across more than a decade, the proj- ect has developed a number of linguistic resources and NLP tools for Latin connected to the IT-TB. As for the resources, the paper presents the syntax- based subcategorization lexicon IT-VaLex and the valency lexicon Latin Vallex. As for the tools, the automatic dependency parsing process is de- scribed, highlighting the core issue of portability of NLP tools across the wide diachronic and diatopic span of Latin texts. A section is dedicated to auto- matic morphological analysis of Latin, introducing the analyzer Lemlat and its recent enhancement with information on derivational morphology and a new set of lexical entries covering a large Onomasticon (from Forcellini dic- tionary) and Medieval Latin (from Du Cange glossary).
|Title of host publication||Digital Classical Philology. Ancient Greek and Latin in the Digital Revolution|
|Number of pages||21|
|Publication status||Published - 2019|
|Name||AGE OF ACCESS? GRUNDFRAGEN DER INFORMATIONSGESELLSCHAFT|
- Index Thomisticus