A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework

Flavio Massimiliano Cecchini*, Martin Riedl, Elisabetta Fersini, Chris Biemann

*Autore corrispondente per questo lavoro

Risultato della ricerca: Contributo in rivistaArticolo in rivistapeer review

Abstract

This article presents a comparison of different Word Sense Induction (wsi) clustering algorithms on two novel pseudoword data sets of semantic-similarity and co-occurrence-based word graphs, with a special focus on the detection of homonymic polysemy. We follow the original definition of a pseudoword as the combination of two monosemous terms and their contexts to simulate a polysemous word. The evaluation is performed comparing the algorithm’s output on a pseudoword’s ego word graph (i.e., a graph that represents the pseudoword’s context in the corpus) with the known subdivision given by the components corresponding to the monosemous source words forming the pseudoword. The main contribution of this article is to present a self-sufficient pseudoword-based evaluation framework for wsi graph-based clustering algorithms, thereby defining a new evaluation measure (top2) and a secondary clustering process (hyperclustering). To our knowledge, we are the first to conduct and discuss a large-scale systematic pseudoword evaluation targeting the induction of coarse-grained homonymous word senses across a large number of graph clustering algorithms.
Lingua originaleEnglish
pagine (da-a)733-770
Numero di pagine38
RivistaLanguage Resources and Evaluation
Volume52
DOI
Stato di pubblicazionePubblicato - 2018

Keywords

  • Evaluation
  • Graph clustering
  • Pseudowords
  • Word sense induction

Fingerprint

Entra nei temi di ricerca di 'A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework'. Insieme formano una fingerprint unica.

Cita questo