Scaling historical text re-use

Marco Buchler, Greta Franzini, Emily Franzini, Maria Moritz

Risultato della ricerca: Contributo in libroContributo a convegno

1 Citazioni (Scopus)

Abstract

Text re-use describes the spoken and written repetition of information. Historical text re-use, with its longer time span, embraces a larger set of morphological, linguistic, syntactic, semantic and copying variations, thus adding complication to text-reuse detection. Furthermore, it increases the chances of redundancy in a digital library. In Natural Language Processing it is crucial to remove these redundancies before we can apply any kind of machine learning techniques to the text. In Humanities, these redundancies foreground textual criticism and allow scholars to identify lines of transmission. Identification can be accomplished by way of automatic or semi-automatic methods. Text re-use algorithms, however, are of squared complexity and call for higher computational power. The present paper addresses this issue of complexity, with a particular focus on its algorithmic implications and solutions.
Lingua originaleEnglish
Titolo della pubblicazione ospiteProceedings of the 2014 IEEE International Conference on Big Data (Big Data)
Pagine23-31
Numero di pagine9
DOI
Stato di pubblicazionePubblicato - 2014
Evento2014 IEEE International Conference on Big Data (Big Data) - Washington, DC
Durata: 27 ott 201430 ott 2014

Convegno

Convegno2014 IEEE International Conference on Big Data (Big Data)
CittàWashington, DC
Periodo27/10/1430/10/14

Keywords

  • humanities
  • natural language processing
  • performance
  • scalability
  • text analysis
  • text reuse

Fingerprint

Entra nei temi di ricerca di 'Scaling historical text re-use'. Insieme formano una fingerprint unica.

Cita questo