Automatic Protein Abbreviations Discovery and Resolution from Full-Text Scientific Papers: The PRAISED Framework

Daniele Toti, Paolo Atzeni, Fabio Polticelli, Fabio Polticelli, Fabio Polticelli

Risultato della ricerca: Contributo in rivistaArticolo in rivista

11 Citazioni (Scopus)

Abstract

This paper describes a methodology for discovering and resolving protein names abbreviations from the full-text versions of scientific articles, implemented in the PRAISED framework with the ultimate purpose of building up a publicly available abbreviation repository. Three processing steps lie at the core of the framework: i) an abbreviation identification phase, carried out via domain-independent metrics, whose purpose is to identify all possible abbreviations within a scientific text; ii) an abbreviation resolution phase, which takes into account a number of syntactical and semantic criteria in order to match an abbreviation with its potential explanation; and iii) a dictionary-based protein name identification, which is meant to select only those abbreviations belonging to the protein science domain. A local copy of the UniProt database is used as a source repository for all the known proteins. © 2012, Jagiellonian University, Medical College, Kraków, Poland. All rights reserved.
Lingua originaleEnglish
pagine (da-a)13-51
Numero di pagine39
RivistaBio-Algorithms and Med-Systems
Volume8
DOI
Stato di pubblicazionePubblicato - 2012

Keywords

  • abbreviations
  • data mining
  • extraction
  • proteins
  • resolution

Fingerprint

Entra nei temi di ricerca di 'Automatic Protein Abbreviations Discovery and Resolution from Full-Text Scientific Papers: The PRAISED Framework'. Insieme formano una fingerprint unica.

Cita questo