Abstract
Language resources (LRs) such as corpora, lexica, grammars and ontologies are strictly related to each other at both
development and exploitation stage. In particular, a strong relation holds between lexical resources and annotated corpora.
Recent years have seen a large growth of projects aimed at building LRs for Classical languages. Among these new LRs are
syntactically annotated corpora (treebanks), which can be exploited to provide empirical evidence to test and refine lexical
resources developed over the centuries by Ancient Greek and Latin lexicography.
This paper describes the application of clustering techniques to the Index Thomisticus Treebank corpus to organise the meanings
of lemma forma in Thomas Aquinas’ works, according to its textual and syntactic behaviour.
Clustering is an unsupervised learning method dealing with finding a structure in a collection of data. Applying clustering
techniques to textual data grounds on the theoretical assumption that words that are used in similar contexts tend to have the
same or related meanings (Distributional Hypothesis by HARRIS (1954)).
Our results show that syntactic metadata are indeed helpful for clustering purposes.
Lingua originale | English |
---|---|
Titolo della pubblicazione ospite | Actes du 31e Colloque International sur le Lexique et la Grammaire |
Pagine | 143-147 |
Numero di pagine | 5 |
Stato di pubblicazione | Pubblicato - 2012 |
Evento | 31e Colloque International sur le Lexique et la Grammaire - Nové Hrady Durata: 19 set 2012 → 22 set 2012 |
Convegno
Convegno | 31e Colloque International sur le Lexique et la Grammaire |
---|---|
Città | Nové Hrady |
Periodo | 19/9/12 → 22/9/12 |
Keywords
- Lexicon
- Treebanks