A knowledge discovery methodology for semantic categorization of unstructured textual sources

Daniele Toti, Paolo Atzeni, Fabio Polticelli

Risultato della ricerca: Contributo in libroContributo a convegno

4 Citazioni (Scopus)

Abstract

We describe a methodology for identifying characterizing terms from a source text or paper and automatically building an ontology around them, with the purpose of semantically categorizing a paper corpus where documents sharing similar subjects may be subsequently clustered together by means of ontology alignment. We first employ a Natural Language Processing pipeline to extract relevant terms from the source text, and then use a combination of a pattern-based and machine-learning approach to establish semantic relationships among those terms, with some user's feedback required in-between. This methodology for discovering characterizing knowledge from textual sources finds its inception as an extension of PRAISED, our abbreviation discovery framework, in order to enhance its resolution capabilities. By moving from a paper-by-paper, mainly syntactical process to a corpus-based, semantic approach, it was in fact possible to overcome earlier limits of the system related to abbreviations whose explanation could not be found within the same paper they were cited in. At the same time, though, the methodology we present is not tied to this specific task, but is instead of relevance for a variety of contexts, and might therefore be used to build a stand-alone system for advanced knowledge extraction and semantic categorization. © 2012 IEEE.
Lingua originaleEnglish
Titolo della pubblicazione ospite8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012r
Pagine944-951
Numero di pagine8
DOI
Stato di pubblicazionePubblicato - 2012
Evento8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012 - Sorrento, ita
Durata: 25 nov 201229 nov 2012

Convegno

Convegno8th International Conference on Signal Image Technology and Internet Based Systems, SITIS 2012
CittàSorrento, ita
Periodo25/11/1229/11/12

Keywords

  • knowledge discovery
  • semantic categorization

Fingerprint Entra nei temi di ricerca di 'A knowledge discovery methodology for semantic categorization of unstructured textual sources'. Insieme formano una fingerprint unica.

Cita questo