Theory and Practice of Corpus Annotation in the Index Thomisticus Treebank

Risultato della ricerca: Contributo in rivistaArticolo in rivistapeer review

Abstract

Corpus linguistics is nowadays a well established field of research, where collaborative work with both computational and theoretical linguistics is required. As a matter of fact, computational linguistics makes use of corpus data to train probabilistic Natural Language Processing (NLP) tools, such as taggers and parsers; on the other hand, in empirical approaches to the study of language, theoretical linguistics refers to corpus evidence. On its side, corpus linguistics, as a discipline in itself, uses NLP tools to (semi)automatically build annotated corpora, and refers to linguistic theory as the backbone for the design of annotation guidelines. The creation of a linguistically annotated corpus is, therefore, an excellent opportunity to apply to real data (and potentially revise) linguistic theories which have been designed in a pre-corpus era. This is an even more attractive challenge if a language like Latin is involved. Indeed, while the language-dependent computational processing of Latin is today limited to automatic morphological tagging, a number of available language-independent methods and tools of analysis can be applied to it.
Lingua originaleEnglish
pagine (da-a)5-23
Numero di pagine19
RivistaLEXIS
Volume27
Stato di pubblicazionePubblicato - 2009

Keywords

  • annotazione
  • corpora
  • latino
  • sintassi
  • treebank

Fingerprint

Entra nei temi di ricerca di 'Theory and Practice of Corpus Annotation in the Index Thomisticus Treebank'. Insieme formano una fingerprint unica.

Cita questo