CINECA IRIS Institutional Research Information System

Recent scientific studies on natural language processing (NLP) report the outstanding effectiveness observed in the use of context-dependent and task-free language understanding models such as ELMo, GPT, and BERT. Specifically, they have proved to achieve state of the art performance in numerous complex NLP tasks such as question answering and sentiment analysis in the English language. Following the great popularity and effectiveness that these models are gaining in the scientific community, we trained a BERT language understanding model for the Italian language (AlBERTo). In particular, AlBERTo is focused on the language used in social networks, specifically on Twitter. To demonstrate its robustness, we evaluated AlBERTo on the EVALITA 2016 task SENTIPOLC (SENTIment POLarity Classification) obtaining state of the art results in subjectivity, polarity and irony detection on Italian tweets. The pre-trained AlBERTo model will be publicly distributed through the GitHub platform at the following web address: https://github.com/marcopoli/AlBERTo-it in order to facilitate future research.

AlBERTo: Italian BERT language understanding model for NLP challenging tasks based on tweets

Polignano M.;Basile P.;de Gemmis M.;Semeraro G.;Basile V.

2019-01-01

Abstract

Recent scientific studies on natural language processing (NLP) report the outstanding effectiveness observed in the use of context-dependent and task-free language understanding models such as ELMo, GPT, and BERT. Specifically, they have proved to achieve state of the art performance in numerous complex NLP tasks such as question answering and sentiment analysis in the English language. Following the great popularity and effectiveness that these models are gaining in the scientific community, we trained a BERT language understanding model for the Italian language (AlBERTo). In particular, AlBERTo is focused on the language used in social networks, specifically on Twitter. To demonstrate its robustness, we evaluated AlBERTo on the EVALITA 2016 task SENTIPOLC (SENTIment POLarity Classification) obtaining state of the art results in subjectivity, polarity and irony detection on Italian tweets. The pre-trained AlBERTo model will be publicly distributed through the GitHub platform at the following web address: https://github.com/marcopoli/AlBERTo-it in order to facilitate future research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo dell'evento
	
				6th Italian Conference on Computational Linguistics, CLiC-it 2019
			
	Luogo dell'evento
	
				Bari
			
	Data dell'evento
	
				2019
			
	Titolo del volume
	
				Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)
			
	Nome editore
	
				CEUR
			
	N. Volume
	
				2481
			
	Pagine (da)
	
				1
			
	Pagine (a)
	
				6
			
	Tutti gli autori
	
						Polignano M.; Basile P.; de Gemmis M.; Semeraro G.; Basile V.
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
paper57.pdf Accesso aperto Descrizione: Articolo principale Tipo di file: PDF EDITORIALE Dimensione 513.87 kB Formato Adobe PDF Visualizza/Apri	513.87 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1759767

Citazioni

ND

106

ND

social impact