The Index Thomisticus Treebank Project: Annotation, Parsing and Valency Lexicon

Marco Carlo Passarotti, Barbara Mcgillivray, Paolo Ruffolo

Risultato della ricerca: Contributo in rivistaArticolo in rivistapeer review

Abstract

We present an overview of the Index Thomisticus Treebank project (IT-TB). The ITTB consists of around 60,000 tokens from the Index Thomisticus by Roberto Busa SJ, an 11- million-token Latin corpus of the texts by Thomas Aquinas. We briefly describe the annotation guidelines, shared with the Latin Dependency Treebank (LDT). The application of data-driven dependency parsers on IT-TB and LDT data is reported on. We present training and parsing results on several datasets and provide evaluation of learning algorithms and techniques. Furthermore, we introduce the IT-TB valency lexicon extracted from the treebank. We report on quantitative data of the lexicon and provide some statistical measures on subcategorisation structures.
Lingua originaleEnglish
pagine (da-a)103-127
Numero di pagine25
RivistaREVUE TAL
Volume50(2)
Stato di pubblicazionePubblicato - 2009

Keywords

  • latino
  • lessico
  • linguistica computazionale
  • parsing
  • treebank
  • valenza

Fingerprint Entra nei temi di ricerca di 'The Index Thomisticus Treebank Project: Annotation, Parsing and Valency Lexicon'. Insieme formano una fingerprint unica.

Cita questo