Abstract
We propose a methodology to automatically discover characterizing knowledge from textual sources, with the purpose of semantically categorizing them and clustering them together according to their subjects. Such a methodology is based upon several challenging steps, like terminology extraction and disambiguation, semantic similarity identification via ontology alignment, and a core pattern-based strategy for automatic ontology building. This methodology was originally devised as an extension of PRAISED, our abbreviation identification and resolution proposal, with the purpose of allowing us to resolve previously unresolvable abbreviations, whose explanation either escapes the system's proximity-based approach or is not found within the very source text they are featured in. By moving from a paper-by-paper, mainly syntactical process to a corpus-based, semantic approach, it will be in fact possible to dramatically enhance our system in terms of its resolution capabilities. Nevertheless, the strategy we present here is not tied to this specific task, but is instead of relevance for a variety of contexts, and might therefore find a far wider applicability for other advanced knowledge extraction and discovery systems. Copyright (c) 2012 - Edizioni Libreria Progetto and the authors.
Lingua originale | English |
---|---|
Titolo della pubblicazione ospite | Proceedings of the 20th Italian Symposium on Advanced Database Systems, SEBD 2012 |
Pagine | 213-220 |
Numero di pagine | 8 |
Stato di pubblicazione | Pubblicato - 2012 |
Evento | 20th Italian Symposium on Advanced Database Systems, SEBD 2012 - Venice, ita Durata: 24 giu 2012 → 27 giu 2012 |
Convegno
Convegno | 20th Italian Symposium on Advanced Database Systems, SEBD 2012 |
---|---|
Città | Venice, ita |
Periodo | 24/6/12 → 27/6/12 |
Keywords
- knowledge discovery
- semantic similarity