Experiments with wikipedia cross-language data fusion

Eugenio Tacchini, Andreas Schultz, Christian Bizer

Risultato della ricerca: Contributo in libroContributo a conferenza

Abstract

There are currently Wikipedia editions in 264 different languages.\r\nEach of these editions contains infoboxes that provide structured data about the\r\ntopic of the article in which an infobox is contained. The content of infoboxes\r\nabout the same topic in different Wikipedia editions varies in completeness,\r\ncoverage and quality. This paper examines the hypothesis that by extracting\r\ninfobox data from multiple Wikipedia editions and by fusing the extracted data\r\namong editions it should be possible to complement data from one edition with\r\npreviously missing values from other editions and to increase the overall quality\r\nof the extracted dataset by choosing property values that are most likely correct\r\nin case of inconsistencies among editions. We will present a software\r\nframework for fusing RDF datasets based on different conflict resolution\r\nstrategies. We will apply the framework to fuse infobox data that has been\r\nextracted from the English, German, Italian and French editions of Wikipedia\r\nand will discuss the accuracy of the conflict resolution strategies that were used\r\nin this experiment.
Lingua originaleInglese
Titolo della pubblicazione ospiteProceedings of the 5th International Workshop on Scripting and Development for the Semantic Web (SFSW 2009)
EditoreCEUR-WS.org
Pagine28-39
Numero di pagine12
ISBN (stampa)ISSN 1613-0073
Stato di pubblicazionePubblicato - 2009
Pubblicato esternamente

Keywords

  • Wikipedia
  • DBpedia
  • Web of data
  • data fusion
  • information quality evaluation

Fingerprint

Entra nei temi di ricerca di 'Experiments with wikipedia cross-language data fusion'. Insieme formano una fingerprint unica.

Cita questo