The intervention shows the first results of a research conducted on a corpus of 7000 posts collected on the Reddit social network during the 2016 American presidential campaign. The research is the result of a collaboration between Berkeley D-Lab, who shared the corpus, LSI -CentraleSupélec and CUBE. Thanks to funding from the Anti-Defamation League, the corpus has been labeled to apply Machine Learning techniques: 400 postshave been labeledas “hate speech”by human analysts. Galofaro, Toffano and Doan applied to both sub-corpora (hate and non-hate speeches) an analysis technique inspired by Greimas’s structural semantics, Eco’s semiotics, and Quantum Information Retrieval (van Rijsbergen).Each text was formalized as a semantic network using the HAL technique. We then measured the semantic similarity between two key words formalized as two word-vectors with the classical measure of cosine-similarity and then compared it with the degree of quantum correlation between them measured with the Born rule. This correlation, linked to the co-occurrence of the word vectors in the same contexts, extracts from the latter useful information to characterize the considered semantic relationships (“presence of correlation”, “absence of correlation”or “presence of anti-correlation”). In this way, the new technique allows to overcome some critical aspects of the Machine Learning techniques currently in use, being based on the meaning of the text and not on the way in which the human analyst labels the corpus.

Semantic Quantum Correlations in Hate Speeches

Francesco Galofaro
First
;
2020-01-01

Abstract

The intervention shows the first results of a research conducted on a corpus of 7000 posts collected on the Reddit social network during the 2016 American presidential campaign. The research is the result of a collaboration between Berkeley D-Lab, who shared the corpus, LSI -CentraleSupélec and CUBE. Thanks to funding from the Anti-Defamation League, the corpus has been labeled to apply Machine Learning techniques: 400 postshave been labeledas “hate speech”by human analysts. Galofaro, Toffano and Doan applied to both sub-corpora (hate and non-hate speeches) an analysis technique inspired by Greimas’s structural semantics, Eco’s semiotics, and Quantum Information Retrieval (van Rijsbergen).Each text was formalized as a semantic network using the HAL technique. We then measured the semantic similarity between two key words formalized as two word-vectors with the classical measure of cosine-similarity and then compared it with the degree of quantum correlation between them measured with the Born rule. This correlation, linked to the co-occurrence of the word vectors in the same contexts, extracts from the latter useful information to characterize the considered semantic relationships (“presence of correlation”, “absence of correlation”or “presence of anti-correlation”). In this way, the new technique allows to overcome some critical aspects of the Machine Learning techniques currently in use, being based on the meaning of the text and not on the way in which the human analyst labels the corpus.
2020
AUG 2020
370
383
http://rifl.unical.it/index.php/rifl/article/view/594/583
quantum information retrieval, semantics, semiotics
Francesco Galofaro, zeno toffano, bich-lien doan
File in questo prodotto:
File Dimensione Formato  
RIFL2019.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 613.4 kB
Formato Adobe PDF
613.4 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1762586
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact