Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction

Flavio Massimiliano Cecchini, Riedl Martin, Biemann Chris

Risultato della ricerca: Contributo in libroContributo a convegno

Abstract

In this paper we define two parallel data sets based on pseudowords, extracted from the same corpus. They both consist of word-centered graphs for each of 1225 different pseudowords, and use respectively first-order co-occurrences and secondorder semantic similarities. We propose an evaluation framework on these data sets for graph-based Word Sense Induction (WSI) focused on the case of coarsegrained homonymy: We compare different WSI clustering algorithms by measuring how well their outputs agree with the a priori known ground-truth decomposition of a pseudoword. We perform this evaluation for four different clustering algorithms: the Markov cluster algorithm, Chinese Whispers, MaxMax and a gangplankbased clustering algorithm. To further improve the comparison between these algorithms and the analysis of their behaviours, we also define a new specific evaluation measure. As far as we know, this is the first large-scale systematic pseudoword evaluation dedicated to the induction of coarsegrained homonymous word senses.
Lingua originaleEnglish
Titolo della pubblicazione ospiteProceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa
Pagine105-114
Numero di pagine10
Volume131
Stato di pubblicazionePubblicato - 2017
EventoNordic Conference on Computational Linguistics, NoDaLiDa - Gothenburg, SWEDEN
Durata: 22 mag 201724 mag 2017

Serie di pubblicazioni

NomeLINKÖPING ELECTRONIC CONFERENCE PROCEEDINGS

Convegno

ConvegnoNordic Conference on Computational Linguistics, NoDaLiDa
CittàGothenburg, SWEDEN
Periodo22/5/1724/5/17

Keywords

  • Graphs
  • Pseudowords
  • Word Sense Induction

Fingerprint Entra nei temi di ricerca di 'Using Pseudowords for Algorithm Comparison: An Evaluation Framework for Graph-based Word Sense Induction'. Insieme formano una fingerprint unica.

Cita questo