CINECA IRIS Institutional Research Information System

This paper presents the Vulnerable Identities Recognition Corpus (VIRC), a novel resource designed to enhance hate speech analysis in Italian and Spanish news headlines. VIRC comprises 880 headlines, manually annotated for vulnerable identities, dangerous discourse, derogatory expressions, and entities. Our experiments reveal that recent large language models (LLMs) struggle with the fine-grained identification of these elements, underscoring the complexity of detecting hate speech. VIRC stands out as the first resource of its kind in these languages, offering a richer annotation scheme compared to existing corpora. The insights derived from VIRC can inform the development of sophisticated detection tools and the creation of policies and regulations to combat hate speech on social media, promoting a safer online environment. Future work will focus on expanding the corpus and refining annotation guidelines to further enhance its comprehensiveness and reliability.

The Vulnerable Identities Recognition Corpus (VIRC) for Hate Speech Analysis

Guillen-Pacho I.;Longo A.;Stranisci M. A.;Patti V.;Badenes-Olmedo C.

2024-01-01

Abstract

This paper presents the Vulnerable Identities Recognition Corpus (VIRC), a novel resource designed to enhance hate speech analysis in Italian and Spanish news headlines. VIRC comprises 880 headlines, manually annotated for vulnerable identities, dangerous discourse, derogatory expressions, and entities. Our experiments reveal that recent large language models (LLMs) struggle with the fine-grained identification of these elements, underscoring the complexity of detecting hate speech. VIRC stands out as the first resource of its kind in these languages, offering a richer annotation scheme compared to existing corpora. The insights derived from VIRC can inform the development of sophisticated detection tools and the creation of policies and regulations to combat hate speech on social media, promoting a safer online environment. Future work will focus on expanding the corpus and refining annotation guidelines to further enhance its comprehensiveness and reliability.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo dell'evento
	
				10th Italian Conference on Computational Linguistics, CLiC-it 2024
			
	Luogo dell'evento
	
				Pisa, Italia
			
	Data dell'evento
	
				2024
			
	Titolo del volume
	
				Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024), Pisa, Italy, December 4-6, 2024
			
	Nome editore
	
				CEUR-WS
			
	N. Volume
	
				3878
			
	Pagine (da)
	
				1
			
	Pagine (a)
	
				8
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://ceur-ws.org/Vol-3878/49_main_long.pdf
			
	Parole Chiave
	
				annotated corpora; hate speech; vulnerable identities
			
	Tutti gli autori
	
						Guillen-Pacho I.; Longo A.; Stranisci M.A.; Patti V.; Badenes-Olmedo C.
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
49_main_long.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 1.16 MB Formato Adobe PDF Visualizza/Apri	1.16 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2059277

Citazioni

ND

1

ND

social impact