Text generation from Discourse Representation Structure (DRS), is a complex logic-to-text generation task where lexical information in the form of logical concepts is translated into its corresponding textual representation. Delexicalization is the process of removing lexical information from the data which helps the model be more robust in producing textual sequences by focusing on the semantic structure of the input rather than the exact lexical content. Implementation of delexicalization is even harder in the case of the DRS-to-Text generation task where the lexical entities are anchored using WordNet synsets and thematic roles are sourced from VerbNet. In this paper, we have introduced novel procedures to selectively delexicalize proper nouns and common nouns. For data transformations, we propose to use two types of lexical abstractions (1): WordNet supersense-based contextually categorized abstraction; and (2): abstraction based on the lexical category associated with named entities and nouns. We present many experiments for evaluating the hypotheses of delexicalization in the DRS-to-Text generation task by using state-of-the-art neural sequence-to-sequence models. Furthermore, we also explored data augmentation through delexicalization while evaluating test sets with different abstraction methodologies i.e., with and without supersenses. Our experimental results proved the effectiveness of model generalizability through delexicalization while comparing it with the results of fully lexicalized DRS-to-Text generation. Delexicalization resulted in an improved translation quality with a significant increase in evaluation scores.
Improving DRS-to-Text Generation Through Delexicalization and Data Augmentation
Amin, Muhammad Saad
;Anselma, Luca;Mazzei, Alessandro
2024-01-01
Abstract
Text generation from Discourse Representation Structure (DRS), is a complex logic-to-text generation task where lexical information in the form of logical concepts is translated into its corresponding textual representation. Delexicalization is the process of removing lexical information from the data which helps the model be more robust in producing textual sequences by focusing on the semantic structure of the input rather than the exact lexical content. Implementation of delexicalization is even harder in the case of the DRS-to-Text generation task where the lexical entities are anchored using WordNet synsets and thematic roles are sourced from VerbNet. In this paper, we have introduced novel procedures to selectively delexicalize proper nouns and common nouns. For data transformations, we propose to use two types of lexical abstractions (1): WordNet supersense-based contextually categorized abstraction; and (2): abstraction based on the lexical category associated with named entities and nouns. We present many experiments for evaluating the hypotheses of delexicalization in the DRS-to-Text generation task by using state-of-the-art neural sequence-to-sequence models. Furthermore, we also explored data augmentation through delexicalization while evaluating test sets with different abstraction methodologies i.e., with and without supersenses. Our experimental results proved the effectiveness of model generalizability through delexicalization while comparing it with the results of fully lexicalized DRS-to-Text generation. Delexicalization resulted in an improved translation quality with a significant increase in evaluation scores.File | Dimensione | Formato | |
---|---|---|---|
Delexicalization_for_DRS_to_Text_Generation__NLDB_2024_.pdf
Accesso aperto
Tipo di file:
POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione
570.42 kB
Formato
Adobe PDF
|
570.42 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.