From Treebanks to Lexical Entries. Clustering the Index Thomisticus

Risultato della ricerca: Contributo in libroContributo a convegno

Abstract

Language resources (LRs) such as corpora, lexica, grammars and ontologies are strictly related to each other at both development and exploitation stage. In particular, a strong relation holds between lexical resources and annotated corpora. Recent years have seen a large growth of projects aimed at building LRs for Classical languages. Among these new LRs are syntactically annotated corpora (treebanks), which can be exploited to provide empirical evidence to test and refine lexical resources developed over the centuries by Ancient Greek and Latin lexicography. This paper describes the application of clustering techniques to the Index Thomisticus Treebank corpus to organise the meanings of lemma forma in Thomas Aquinas’ works, according to its textual and syntactic behaviour. Clustering is an unsupervised learning method dealing with finding a structure in a collection of data. Applying clustering techniques to textual data grounds on the theoretical assumption that words that are used in similar contexts tend to have the same or related meanings (Distributional Hypothesis by HARRIS (1954)). Our results show that syntactic metadata are indeed helpful for clustering purposes.
Lingua originaleEnglish
Titolo della pubblicazione ospiteActes du 31e Colloque International sur le Lexique et la Grammaire
Pagine143-147
Numero di pagine5
Stato di pubblicazionePubblicato - 2012
Evento31e Colloque International sur le Lexique et la Grammaire - Nové Hrady
Durata: 19 set 201222 set 2012

Convegno

Convegno31e Colloque International sur le Lexique et la Grammaire
CittàNové Hrady
Periodo19/9/1222/9/12

Keywords

  • Lexicon
  • Treebanks

Fingerprint

Entra nei temi di ricerca di 'From Treebanks to Lexical Entries. Clustering the Index Thomisticus'. Insieme formano una fingerprint unica.

Cita questo