CINECA IRIS Institutional Research Information System

The development of computational methods to detect abusive language in social media within variable and multilingual contexts has recently gained significant traction. The growing interest is confirmed by the large number of benchmark corpora for different languages developed in the latest years. However, abusive language behaviour is multifaceted and available datasets are featured by different topical focuses. This makes abusive language detection a domain-dependent task, and building a robust system to detect general abusive content a first challenge. Moreover, most resources are available for English, which makes detecting abusive language in low-resource languages a further challenge. We address both challenges by considering ten publicly available datasets across different domains and languages. A hybrid approach with deep learning and a multilingual lexicon to cross-domain and cross-lingual detection of abusive content is proposed and compared with other simpler models. We show that training a system on general abusive language datasets will produce a cross-domain robust system, which can be used to detect other more specific types of abusive content. We also found that using the domain-independent lexicon HurtLex is useful to transfer knowledge between domains and languages. In the cross-lingual experiment, we demonstrate the effectiveness of our joint learning model also in out-domain scenarios.

Cross-domain and Cross-lingual abusive language detection: A hybrid approach with deep learning and a multilingual lexicon

Pamungkas E.;Patti V.

2019-01-01

Abstract

The development of computational methods to detect abusive language in social media within variable and multilingual contexts has recently gained significant traction. The growing interest is confirmed by the large number of benchmark corpora for different languages developed in the latest years. However, abusive language behaviour is multifaceted and available datasets are featured by different topical focuses. This makes abusive language detection a domain-dependent task, and building a robust system to detect general abusive content a first challenge. Moreover, most resources are available for English, which makes detecting abusive language in low-resource languages a further challenge. We address both challenges by considering ten publicly available datasets across different domains and languages. A hybrid approach with deep learning and a multilingual lexicon to cross-domain and cross-lingual detection of abusive content is proposed and compared with other simpler models. We show that training a system on general abusive language datasets will produce a cross-domain robust system, which can be used to detect other more specific types of abusive content. We also found that using the domain-independent lexicon HurtLex is useful to transfer knowledge between domains and languages. In the cross-lingual experiment, we demonstrate the effectiveness of our joint learning model also in out-domain scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo dell'evento
	
				57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Student Research Workshop, SRW 2019
			
	Luogo dell'evento
	
				Florence, Italy
			
	Data dell'evento
	
				2019
			
	Titolo del volume
	
				Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
			
	Nome editore
	
				Association for Computational Linguistics (ACL)
			
	Pagine (da)
	
				363
			
	Pagine (a)
	
				370
			
	Codice ISBN
	
				9781950737475
			
	DOI
	
				https://dx.doi.org/10.18653/v1/P19-2051
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://aclanthology.org/P19-2051/
			
	Parole Chiave
	
				abusive language detection, multilinguality, social media, hate lexicons, deep learning, cross-domain experiments
			
	Tutti gli autori
	
						Pamungkas E.; Patti V.
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
P19-2051.pdf Accesso riservato Tipo di file: PDF EDITORIALE Dimensione 303.78 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	303.78 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1757917

Citazioni

ND

77

43

social impact