I-CAB: the Italian Content Annotation Bank

Rachele Sprugnoli, B. Magnini, E. Pianta, C. Girardi, M. Negri, L. Romano, M. Speranza, V. Bartalesi Lenzi

Risultato della ricerca: Contributo in libroContributo a convegno

31 Citazioni (Scopus)

Abstract

In this paper we present work in progress for the creation of the Italian Content Annotation Bank (I-CAB), a corpus of Italian news annotated with semantic information at different levels. The first level is represented by temporal expressions, the second level is represented by different types of entities (i.e. person, organizations, locations and geo-political entities), and the third level is represented by relations between entities (e.g. the affiliation relation connecting a person to an organization). So far I-CAB has been manually annotated with temporal expressions, person entities and organization entities. As we intend I-CAB to become a benchmark for various automatic Information Extraction tasks, we followed a policy of reusing already available markup languages. In particular, we adopted the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognition and Normalization tasks. As the ACE guidelines have originally been developed for English, part of the effort consisted in adapting them to the specific morpho-syntactic features of Italian. Finally, we have extended them to include a wider range of entities, such as conjunctions.
Lingua originaleEnglish
Titolo della pubblicazione ospite5th International Conference on Language Resources and Evaluation (LREC 2006)
Pagine963-968
Numero di pagine6
Stato di pubblicazionePubblicato - 2006
Evento5th International Conference on Language Resources and Evaluation (LREC 2006) - Genova, Italy
Durata: 22 mag 200628 mag 2006

Convegno

Convegno5th International Conference on Language Resources and Evaluation (LREC 2006)
CittàGenova, Italy
Periodo22/5/0628/5/06

Keywords

  • Content Processing
  • corpora
  • information extraction
  • semantic annotation

Fingerprint

Entra nei temi di ricerca di 'I-CAB: the Italian Content Annotation Bank'. Insieme formano una fingerprint unica.

Cita questo