Improving DRS-to-Text Generation Through Delexicalization and Data Augmentation

Amin, Muhammad Saad; Anselma, Luca; Mazzei, Alessandro

doi:10.1007/978-3-031-70239-6_9

Text generation from Discourse Representation Structure (DRS), is a complex logic-to-text generation task where lexical information in the form of logical concepts is translated into its corresponding textual representation. Delexicalization is the process of removing lexical information from the data which helps the model be more robust in producing textual sequences by focusing on the semantic structure of the input rather than the exact lexical content. Implementation of delexicalization is even harder in the case of the DRS-to-Text generation task where the lexical entities are anchored using WordNet synsets and thematic roles are sourced from VerbNet. In this paper, we have introduced novel procedures to selectively delexicalize proper nouns and common nouns. For data transformations, we propose to use two types of lexical abstractions (1): WordNet supersense-based contextually categorized abstraction; and (2): abstraction based on the lexical category associated with named entities and nouns. We present many experiments for evaluating the hypotheses of delexicalization in the DRS-to-Text generation task by using state-of-the-art neural sequence-to-sequence models. Furthermore, we also explored data augmentation through delexicalization while evaluating test sets with different abstraction methodologies i.e., with and without supersenses. Our experimental results proved the effectiveness of model generalizability through delexicalization while comparing it with the results of fully lexicalized DRS-to-Text generation. Delexicalization resulted in an improved translation quality with a significant increase in evaluation scores.

Improving DRS-to-Text Generation Through Delexicalization and Data Augmentation

Amin, Muhammad Saad;Anselma, Luca;Mazzei, Alessandro

2024-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo dell'evento
	
				Applications of Natural Language to Data Bases
			
	Luogo dell'evento
	
				Turin, Italy
			
	Data dell'evento
	
				25-27 June 2024
			
	Titolo del volume
	
				Natural Language Processing and Information Systems
			
	Nome editore
	
				Springer
			
	Pagine (da)
	
				121
			
	Pagine (a)
	
				136
			
	Codice ISBN
	
				9783031702389
9783031702396
			
	DOI
	
				https://dx.doi.org/10.1007/978-3-031-70239-6_9
			
	Parole Chiave
	
				Delexicalization, Data augmentation, Discourse representation structure, Formal meaning representation, Neural DRS-to-Text generation, Super senses
			
	Tutti gli autori
	
						Amin, Muhammad Saad; Anselma, Luca; Mazzei, Alessandro
					
	Appare nelle tipologie:
	
				04A-Conference paper in volume

File in questo prodotto:

File	Dimensione	Formato
Delexicalization_for_DRS_to_Text_Generation__NLDB_2024_.pdf Accesso aperto Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE) Dimensione 570.42 kB Formato Adobe PDF Visualizza/Apri	570.42 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2014270

CINECA IRIS Institutional Research Information System

Improving DRS-to-Text Generation Through Delexicalization and Data Augmentation

Amin, Muhammad Saad;Anselma, Luca;Mazzei, Alessandro

2024-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

CINECA IRIS Institutional Research Information System

Improving DRS-to-Text Generation Through Delexicalization and Data Augmentation

Amin, Muhammad Saad;Anselma, Luca;Mazzei, Alessandro

2024-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)