The increasing use of large language models has heightened the demand for more extensive datasets in natural language processing (NLP). While various augmentation techniques are being employed to enhance data quantity, many introduce noise or struggle with structurally complex inputs like Discourse Representation Structures (DRS). This study introduces novel data augmentation techniques for both semantic parsing (Text-to-DRS) and text generation (DRS-to-Text), emphasizing enhancements such as named entity augmentation, lexical substitutions utilizing WordNet, and grammatical transformations through changes in tense. The proposed methods led to a considerable expansion of the Parallel Meaning Bank (PMB) dataset, ensuring semantic accuracy and contextual relevance. The augmentation increased both gold and silver instances by a factor of 9, resulting in over 1.3 million new examples. We evaluated four transformer models (byT5, mT5, T5, and mBART) using this augmented dataset. Experimental evaluations revealed substantial improvements across multiple performance metrics. Notably, for semantic parsing, we observed a 17.65% increase in SMATCH (F1) score, and among different evaluation measures for text generation, we have improvements of 14.38% in BLEU score and 6.43% in METEOR score. The observed improvements highlight the effectiveness of our proposed augmentation methodologies in boosting model capabilities for complex neural semantic parsing and generation tasks.

Improving Semantic Parsing and Text Generation Through Multi-Faceted Data Augmentation

Amin, Muhammad Saad;Anselma, Luca;Mazzei, Alessandro
2025-01-01

Abstract

The increasing use of large language models has heightened the demand for more extensive datasets in natural language processing (NLP). While various augmentation techniques are being employed to enhance data quantity, many introduce noise or struggle with structurally complex inputs like Discourse Representation Structures (DRS). This study introduces novel data augmentation techniques for both semantic parsing (Text-to-DRS) and text generation (DRS-to-Text), emphasizing enhancements such as named entity augmentation, lexical substitutions utilizing WordNet, and grammatical transformations through changes in tense. The proposed methods led to a considerable expansion of the Parallel Meaning Bank (PMB) dataset, ensuring semantic accuracy and contextual relevance. The augmentation increased both gold and silver instances by a factor of 9, resulting in over 1.3 million new examples. We evaluated four transformer models (byT5, mT5, T5, and mBART) using this augmented dataset. Experimental evaluations revealed substantial improvements across multiple performance metrics. Notably, for semantic parsing, we observed a 17.65% increase in SMATCH (F1) score, and among different evaluation measures for text generation, we have improvements of 14.38% in BLEU score and 6.43% in METEOR score. The observed improvements highlight the effectiveness of our proposed augmentation methodologies in boosting model capabilities for complex neural semantic parsing and generation tasks.
2025
13
150145
150167
https://ieeexplore.ieee.org/document/11104098
Discourse representation structure (DRS); DRS semantic parsing; Formal meaning representation; Multi-faceted DRS augmentation; Text generation from DRS
Amin, Muhammad Saad; Anselma, Luca; Mazzei, Alessandro
File in questo prodotto:
File Dimensione Formato  
Improving_Semantic_Parsing_and_Text_Generation_Through_Multi-Faceted_Data_Augmentation.pdf

Accesso aperto

Dimensione 3.21 MB
Formato Adobe PDF
3.21 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2092871
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact