A framework for semi-automatic identification, disambiguation and storage of protein-related abbreviations in scientific literature

Daniele Toti, Paolo Atzeni, Fabio Polticelli

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

We propose a framework for identifying, disambiguating and storing protein-related abbreviations as found in the full texts of scientific papers, in order to build and maintain a publicly available abbreviation repository via a semi-automatic process. This process involves information extraction methods and techniques for acronym identification and resolution, based on lexical clues and syntactical, largely domain-independent criteria. A dictionary and an ontology for proteins provide the means for matching and disambiguating the biological entities. User feedback is gathered at the end of the process and the confirmed entries are then stored and made available to the scientific community for further reviewing. © 2011 IEEE.
Original languageEnglish
Title of host publicationProceedings - International Conference on Data Engineering
Pages59-61
Number of pages3
DOIs
Publication statusPublished - 2011
Event2011 IEEE 27th International Conference on Data Engineering Workshops, ICDE 2011 - Hannover, deu
Duration: 11 Apr 201116 Apr 2011

Conference

Conference2011 IEEE 27th International Conference on Data Engineering Workshops, ICDE 2011
CityHannover, deu
Period11/4/1116/4/11

Keywords

  • abbreviations

Fingerprint

Dive into the research topics of 'A framework for semi-automatic identification, disambiguation and storage of protein-related abbreviations in scientific literature'. Together they form a unique fingerprint.

Cite this