Text generation from Discourse Representation Structure (DRS), is a complex logic-to-text generation task where lexical information in the form of logical concepts is translated into its corresponding textual representation. Delexicalization is the process of removing lexical information from the data which helps the model be more robust in producing textual sequences by focusing on the semantic structure of the input rather than the exact lexical content. Implementation of delexicalization is even harder in the case of the DRS-to-Text generation task where the lexical entities are anchored using WordNet synsets and thematic roles are sourced from VerbNet. In this paper, we have introduced novel procedures to selectively delexicalize proper nouns and common nouns. For data transformations, we propose to use two types of lexical abstractions (1): WordNet supersense-based contextually categorized abstraction; and (2): abstraction based on the lexical category associated with named entities and nouns. We present many experiments for evaluating the hypotheses of delexicalization in the DRS-to-Text generation task by using state-of-the-art neural sequence-to-sequence models. Furthermore, we also explored data augmentation through delexicalization while evaluating test sets with different abstraction methodologies i.e., with and without supersenses. Our experimental results proved the effectiveness of model generalizability through delexicalization while comparing it with the results of fully lexicalized DRS-to-Text generation. Delexicalization resulted in an improved translation quality with a significant increase in evaluation scores.

Improving DRS-to-Text Generation Through Delexicalization and Data Augmentation

Amin, Muhammad Saad
;
Anselma, Luca;Mazzei, Alessandro
2024-01-01

Abstract

Text generation from Discourse Representation Structure (DRS), is a complex logic-to-text generation task where lexical information in the form of logical concepts is translated into its corresponding textual representation. Delexicalization is the process of removing lexical information from the data which helps the model be more robust in producing textual sequences by focusing on the semantic structure of the input rather than the exact lexical content. Implementation of delexicalization is even harder in the case of the DRS-to-Text generation task where the lexical entities are anchored using WordNet synsets and thematic roles are sourced from VerbNet. In this paper, we have introduced novel procedures to selectively delexicalize proper nouns and common nouns. For data transformations, we propose to use two types of lexical abstractions (1): WordNet supersense-based contextually categorized abstraction; and (2): abstraction based on the lexical category associated with named entities and nouns. We present many experiments for evaluating the hypotheses of delexicalization in the DRS-to-Text generation task by using state-of-the-art neural sequence-to-sequence models. Furthermore, we also explored data augmentation through delexicalization while evaluating test sets with different abstraction methodologies i.e., with and without supersenses. Our experimental results proved the effectiveness of model generalizability through delexicalization while comparing it with the results of fully lexicalized DRS-to-Text generation. Delexicalization resulted in an improved translation quality with a significant increase in evaluation scores.
2024
Inglese
contributo
1 - Conferenza
The 29th International Conference on Natural Language & Information Systems
Turin, Italy
25-27 June 2024
Internazionale
Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V.
Natural Language Processing and Information Systems
Esperti anonimi
Springer
Cham
SVIZZERA
121
136
16
9783031702389
9783031702396
Delexicalization, Data augmentation, Discourse representation structure, Formal meaning representation, Neural DRS-to-Text generation, Super senses
no
1 – prodotto con file in versione Open Access (allegherò il file al passo 6 - Carica)
3
info:eu-repo/semantics/conferenceObject
04-CONTRIBUTO IN ATTI DI CONVEGNO::04A-Conference paper in volume
Amin, Muhammad Saad; Anselma, Luca; Mazzei, Alessandro
273
open
File in questo prodotto:
File Dimensione Formato  
Delexicalization_for_DRS_to_Text_Generation__NLDB_2024_.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 570.42 kB
Formato Adobe PDF
570.42 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2014270
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact