CINECA IRIS Institutional Research Information System

Irony is nowadays a pervasive phenomenon in social networks. The multimodal functionalities of these platforms (i.e., the possibility to attach audio, video, and images to textual information) are increasingly leading their users to employ combinations of information in different formats to express their ironic thoughts. The present work focuses on the study of irony detection in social media posts involving image and text. To this end, a transformer architecture for the fusion of textual and image information is proposed. The model leverages disentangled text attention with visual transformers, improving F1-score up to 9% over previous existing works in the field and current state-of-the-art visio-linguistic transformers. The proposed architecture was evaluated in three different multimodal datasets gathered from Twitter and Tumblr. The results revealed that, in many situations, the text-only version of the architecture was able to capture the ironic nature of the message without using visual information. This phenomenon was further analysed, leading to the identification of linguistic patterns that could provide the context necessary for irony detection without the need for additional visual information.

Transformer-based models for multimodal irony detection

Tomas D.;Ortega-Bueno R.;Zhang G.;Rosso P.;Schifanella R.^Last

2023-01-01

Abstract

Irony is nowadays a pervasive phenomenon in social networks. The multimodal functionalities of these platforms (i.e., the possibility to attach audio, video, and images to textual information) are increasingly leading their users to employ combinations of information in different formats to express their ironic thoughts. The present work focuses on the study of irony detection in social media posts involving image and text. To this end, a transformer architecture for the fusion of textual and image information is proposed. The model leverages disentangled text attention with visual transformers, improving F1-score up to 9% over previous existing works in the field and current state-of-the-art visio-linguistic transformers. The proposed architecture was evaluated in three different multimodal datasets gathered from Twitter and Tumblr. The results revealed that, in many situations, the text-only version of the architecture was able to capture the ironic nature of the message without using visual information. This phenomenon was further analysed, leading to the identification of linguistic patterns that could provide the context necessary for irony detection without the need for additional visual information.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo rivista
	
				JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING
			
	N. Volume
	
				14
			
	Fascicolo
	
				6
			
	Pagine (da)
	
				7399
			
	Pagine (a)
	
				7410
			
	DOI
	
				https://dx.doi.org/10.1007/s12652-022-04447-y
			
	URL del prodotto (archivi open access, fulltext su sito editore, etc.)
	
				https://link.springer.com/article/10.1007/s12652-022-04447-y
			
	Parole Chiave
	
				Irony detection, Transformer, Multimodality, Image text fusion
			
	Tutti gli autori
	
						Tomas D.; Ortega-Bueno R.; Zhang G.; Rosso P.; Schifanella R.
					
	Appare nelle tipologie:
	
				03A-Articolo su Rivista

File in questo prodotto:

File	Dimensione	Formato
s12652-022-04447-y.pdf Accesso aperto Tipo di file: PDF EDITORIALE Dimensione 903.17 kB Formato Adobe PDF Visualizza/Apri	903.17 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1948678

Citazioni

ND

10

ND

social impact