Hate speech online: detection methodologies between algorithmic and qualitative evaluations. A case study on antiSemitism on Twitter

Stefano Pasta*

*Autore corrispondente per questo lavoro

Risultato della ricerca: Contributo in libroContributo a convegno

Abstract

Research on the forms of contemporary hatred (Siegel, 2020; Santerini, 2021), and in particular studies on the changes that have taken place on the social Web (Pasta, 2018, 2019), agree that this phenomenon requires a multidisciplinary approach. At an international level, the field of Hate Studies, which combines the legal and IT fields with the humanities (sociological, pedagogical, anthropological, philosophical, linguistic, semiotic) and the interests of scholars, researchers, politicians, communication experts, human rights, NGO leaders, is marked by a significant number of research aimed at automating detection processes and creating an algorithm capable of identifying online hatred. The corpus is almost always taken from Twitter, since among the main social networks it is the only one with easy access to data automatically through APIs, i.e. application programming interfaces. In this field of research there is a tension between human-non-human and technology-human action, with the tendency to limit interventions to artificial intelligence to the detriment of more interpretative approaches. At the macro level, we can identify two groups among international studies. The first includes searches that use only machine learning methods, while the second includes studies that combine automatic search with human classification (Pasta, 2021; 2023). The contribution presents an analysis that combines socio-educational approach and automatic computer processing. This methodology is applied to various target groups and aims, alongside detection, at a more in-depth study of its characteristics, in order to design coherent educational interventions. This case deals with the classification of antiSemitic hate speech on Twitter, in Italian from 1st March 2019 to 28th February 2023. The question is whether there are monthly spikes in antiSemitic hatred, and the research is carried out through temporal analyses of samples manually classified by experts, and later is specified which rhetoric and forms of hatred are prevalent. The methodology used falls under the techniques of social network analysis (SNA). The data were collected using the open-source Python library GetOldTweets3, which allows to obtain tweets via query search. With the search string that combined the presence of a lemma identifying the target group with (AND) a reference to elements typical of antiSemitism according to the literature, all the tweets published in the two years were extracted. Subsequently, following the technique of simple random sampling without repetition, a sample consisting of 100 tweets per month was selected, thus obtaining a sample dataset of 4800 total posts (Gareth et al., 2017). The latter was manually classified by industry experts (“annotators”), who determined whether the tweet contained hate or not. In case there was a hate content, they assigned the rhetoric and the corresponding form of antiSemitism, according to the Working Definition of the International Holocaust Remembrance Alliance (IHRA). The former were derived from a psycho-social analysis and historical-literary on linguistic forms of hostility and already tested for other target groups by the same interdisciplinary team (insults, derision/irony, exclusion/separation, prejudice, dehumanization, humiliation/contempt, fear, competition, incitement/violence). After returning the main results (however the contribution focuses on the methodological approach), the last step is to submit the results to a confusion matrix, i.e. a tool for analyzing the errors made by a machine learning model (Gareth, Witten, Tibshirani, 2017). All the texts classified by the annotators are thus also evaluated by an algorithm capable of establishing whether the tweet contains hate, after applying a series of typical Natural Language Processing (NLP) procedures to “clean” the texts, such as the removal of superfluous characters, the conversion of text
Lingua originaleEnglish
Titolo della pubblicazione ospiteInnovating Teaching & Learning. Inclusion and Wellbeing for the Data Society. Book of Abstracts and Proceedings, ISYDE2023, Italian Symposium on DIGITAL EDUCATION
Pagine90-92
Numero di pagine3
Stato di pubblicazionePubblicato - 2023
EventoISYDE2023, Italian Symposium on DIGITAL EDUCATION - Reggio Emilia
Durata: 13 set 202315 set 2023

Convegno

ConvegnoISYDE2023, Italian Symposium on DIGITAL EDUCATION
CittàReggio Emilia
Periodo13/9/2315/9/23

Keywords

  • Hate speech online
  • AntiSemitism online
  • Artificial Intelligence.
  • social web
  • detection

Fingerprint

Entra nei temi di ricerca di 'Hate speech online: detection methodologies between algorithmic and qualitative evaluations. A case study on antiSemitism on Twitter'. Insieme formano una fingerprint unica.

Cita questo