This paper describes the DipInfo-UniTo system participating to the GEM shared task 2024. We participate only to the Data-to-Text (D2T) task. The DipInfo-UniTo system is based on Mistral (Jiang et al., 2023), a recent Large Language Model (LLM). Most LLMs are capable of generating high-quality text for D2T tasks but, crucially, they often fall short in terms of adequacy, and sometimes exhibit {``}hallucinations{''}. To mitigate this issue, we have implemented a generation pipeline that combines LLMs with techniques from the traditional Natural Language Generation (NLG) pipeline. In particular, we have a three step process SGA, consisting in (1) Splitting the original set of triples, (2) Generating verbalizations from the resulting split data units, (3) Aggregating the verbalizations produced in the previous step.

DipInfo-UniTo at the GEM'24 Data-to-Text Task: Augmenting LLMs with the Split-Generate-Aggregate Pipeline

Oliverio, Michael
Membro del Collaboration Group
;
Balestrucci, Pier Felice
Membro del Collaboration Group
;
Mazzei, Alessandro
Membro del Collaboration Group
;
Basile, Valerio
Membro del Collaboration Group
2024-01-01

Abstract

This paper describes the DipInfo-UniTo system participating to the GEM shared task 2024. We participate only to the Data-to-Text (D2T) task. The DipInfo-UniTo system is based on Mistral (Jiang et al., 2023), a recent Large Language Model (LLM). Most LLMs are capable of generating high-quality text for D2T tasks but, crucially, they often fall short in terms of adequacy, and sometimes exhibit {``}hallucinations{''}. To mitigate this issue, we have implemented a generation pipeline that combines LLMs with techniques from the traditional Natural Language Generation (NLG) pipeline. In particular, we have a three step process SGA, consisting in (1) Splitting the original set of triples, (2) Generating verbalizations from the resulting split data units, (3) Aggregating the verbalizations produced in the previous step.
2024
GEM Shared Task at the Generation Challenges (INLG'24)
Tokyo
Dicembre 2024
Proceedings of the 17th International Natural Language Generation Conference: Generation Challenges
Association for Computational Linguistics
59
65
https://aclanthology.org/2024.inlg-genchal.6
Oliverio, Michael; Balestrucci, Pier Felice; Mazzei, Alessandro; Basile, Valerio
File in questo prodotto:
File Dimensione Formato  
2024.inlg-genchal.6.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 298.12 kB
Formato Adobe PDF
298.12 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2037951
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact