Abstract
There are currently Wikipedia editions in 264 different languages.\r\nEach of these editions contains infoboxes that provide structured data about the\r\ntopic of the article in which an infobox is contained. The content of infoboxes\r\nabout the same topic in different Wikipedia editions varies in completeness,\r\ncoverage and quality. This paper examines the hypothesis that by extracting\r\ninfobox data from multiple Wikipedia editions and by fusing the extracted data\r\namong editions it should be possible to complement data from one edition with\r\npreviously missing values from other editions and to increase the overall quality\r\nof the extracted dataset by choosing property values that are most likely correct\r\nin case of inconsistencies among editions. We will present a software\r\nframework for fusing RDF datasets based on different conflict resolution\r\nstrategies. We will apply the framework to fuse infobox data that has been\r\nextracted from the English, German, Italian and French editions of Wikipedia\r\nand will discuss the accuracy of the conflict resolution strategies that were used\r\nin this experiment.
| Lingua originale | Inglese |
|---|---|
| Titolo della pubblicazione ospite | Proceedings of the 5th International Workshop on Scripting and Development for the Semantic Web (SFSW 2009) |
| Editore | CEUR-WS.org |
| Pagine | 28-39 |
| Numero di pagine | 12 |
| ISBN (stampa) | ISSN 1613-0073 |
| Stato di pubblicazione | Pubblicato - 2009 |
| Pubblicato esternamente | Sì |
Keywords
- Wikipedia
- DBpedia
- Web of data
- data fusion
- information quality evaluation