The data augmentation approach is becoming very popular in Natural Language Generation (NLG). Different approaches have been utilized in NLP and NLG to augment data and increase training examples for the neural model. Yet no studies have performed augmentation on logical input i.e., Discourse Representation Structures (DRS). We present data augmentation in DRS i.e., DRS taken from the PMB corpus, for the DRS-to-Text generation task. We conducted our experiments on a standard bi-LSTM-based sequence-to-sequence model thus creating an end-to-end neural approach for generating English sentences from DRS. We evaluated the output generated from word-level and character-level decoders with the help of reference-based evaluation metrics like BLEU, ROUGE, METEOR, NIST, and CIDEr. The practical implementation of augmented DRS succeeded in achieving better results compared to DRS without augmentation. To prove the significance of our model, we conducted statistical significance tests i.e., the Shapiro-Wilk Test (to check data normality) and the Wilcoxon Test (to test model significance). Wilcoxon results states that our model is significantly better with the p-value = 2.37e-05 for Char-level model and p-value = 7.78e-07 for Word-level model.

Towards Data Augmentation for DRS-to-Text Generation

Amin M. S.;Mazzei A.;Anselma L.
2022-01-01

Abstract

The data augmentation approach is becoming very popular in Natural Language Generation (NLG). Different approaches have been utilized in NLP and NLG to augment data and increase training examples for the neural model. Yet no studies have performed augmentation on logical input i.e., Discourse Representation Structures (DRS). We present data augmentation in DRS i.e., DRS taken from the PMB corpus, for the DRS-to-Text generation task. We conducted our experiments on a standard bi-LSTM-based sequence-to-sequence model thus creating an end-to-end neural approach for generating English sentences from DRS. We evaluated the output generated from word-level and character-level decoders with the help of reference-based evaluation metrics like BLEU, ROUGE, METEOR, NIST, and CIDEr. The practical implementation of augmented DRS succeeded in achieving better results compared to DRS without augmentation. To prove the significance of our model, we conducted statistical significance tests i.e., the Shapiro-Wilk Test (to check data normality) and the Wilcoxon Test (to test model significance). Wilcoxon results states that our model is significantly better with the p-value = 2.37e-05 for Char-level model and p-value = 7.78e-07 for Word-level model.
2022
6th Workshop on Natural Language for Artificial Intelligence, NL4AI 2022
Udine
November 30, 2022
Proceedings of the Sixth Workshop on Natural Language for Artificial Intelligence (NL4AI 2022) co-located with 21th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2022)
CEUR-WS
3287
141
152
https://ceur-ws.org/Vol-3287/paper14.pdf
Bi-LSTM; Data Augmentation; DRS-to-Text Generation; Neural Network; Parallel Meaning Bank (PMB); Shapiro-Wilk Test; Statistical Significance Test; Wilcoxon Test
Amin M.S.; Mazzei A.; Anselma L.
File in questo prodotto:
File Dimensione Formato  
paper14.pdf

Accesso aperto

Dimensione 851.69 kB
Formato Adobe PDF
851.69 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1887628
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact