The Project of the Index Thomisticus Treebank

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The paper introduces the project of the Index Thomisticus Treebank (IT-TB). The IT-TB is a dependency-based treebank based on the corpus of the Index Thomisticus by father Roberto Busa (IT), which includes the opera omnia of Thomas Aquinas, for a total of approximately 11 million words. Currently, the IT-TB is the largest Latin treebank available, with more than 350,000 nodes in around 17,000 sentences. The annotation covers the entire books 1, 2 and 3 of Summa contra Gentiles, plus excerpts from Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae. The paper details the multi-layer annotation style of the IT-TB and its background theoretical motivations. The conversion process to the now widely used Universal Dependencies style is described as well. Across more than a decade, the proj- ect has developed a number of linguistic resources and NLP tools for Latin connected to the IT-TB. As for the resources, the paper presents the syntax- based subcategorization lexicon IT-VaLex and the valency lexicon Latin Vallex. As for the tools, the automatic dependency parsing process is de- scribed, highlighting the core issue of portability of NLP tools across the wide diachronic and diatopic span of Latin texts. A section is dedicated to auto- matic morphological analysis of Latin, introducing the analyzer Lemlat and its recent enhancement with information on derivational morphology and a new set of lexical entries covering a large Onomasticon (from Forcellini dic- tionary) and Medieval Latin (from Du Cange glossary).
Original languageEnglish
Title of host publicationDigital Classical Philology. Ancient Greek and Latin in the Digital Revolution
EditorsMonica Berti
Pages299-319
Number of pages21
Volume10
DOIs
Publication statusPublished - 2019

Publication series

NameAGE OF ACCESS? GRUNDFRAGEN DER INFORMATIONSGESELLSCHAFT

Keywords

  • Index Thomisticus
  • Latin
  • Syntax
  • Treebank

Fingerprint

Dive into the research topics of 'The Project of the Index Thomisticus Treebank'. Together they form a unique fingerprint.

Cite this