Abstract
Corpus linguistics is nowadays a well established field of research, where
collaborative work with both computational and theoretical linguistics is required.
As a matter of fact, computational linguistics makes use of corpus data to train
probabilistic Natural Language Processing (NLP) tools, such as taggers and parsers;
on the other hand, in empirical approaches to the study of language, theoretical
linguistics refers to corpus evidence. On its side, corpus linguistics, as a discipline in
itself, uses NLP tools to (semi)automatically build annotated corpora, and refers to
linguistic theory as the backbone for the design of annotation guidelines.
The creation of a linguistically annotated corpus is, therefore, an excellent
opportunity to apply to real data (and potentially revise) linguistic theories which
have been designed in a pre-corpus era. This is an even more attractive challenge if a
language like Latin is involved. Indeed, while the language-dependent
computational processing of Latin is today limited to automatic morphological
tagging, a number of available language-independent methods and tools of analysis
can be applied to it.
Lingua originale | English |
---|---|
pagine (da-a) | 5-23 |
Numero di pagine | 19 |
Rivista | LEXIS |
Volume | 27 |
Stato di pubblicazione | Pubblicato - 2009 |
Keywords
- annotazione
- corpora
- latino
- sintassi
- treebank