The main goal of this work is the creation of the Italian version of the WebNLG corpus through the application of Neural Machine Translation (NMT) and post-editing with hand-written rules. To achieve this goal, in a first step, several existing NMT models were analysed and compared in order to identify the system with the highest performance on the original corpus. In a second step, after using the best NMT system, we semi-automatically designed and applied a number of rules to refine and improve the quality of the produced resource, creating a new corpus named WebNLG-IT. We used this resource for fine-tuning several LLMs for RDF-to-text tasks. In this way, comparing the performance of LLM-based generators on both Italian and English, we have (1) evaluated the quality of WebNLG-IT with respect to the original English version, (2) released the first fine-tuned LLM-based system for generating Italian from semantic web triples and (3) introduced an Italian version of a modular generation pipeline for RDF-to-text.

WebNLG-IT: Construction of an aligned RDF-Italian corpus through Machine Translation techniques

Oliverio Michael;Balestrucci Pier Felice;Mazzei Alessandro;Basile Valerio
2025-01-01

Abstract

The main goal of this work is the creation of the Italian version of the WebNLG corpus through the application of Neural Machine Translation (NMT) and post-editing with hand-written rules. To achieve this goal, in a first step, several existing NMT models were analysed and compared in order to identify the system with the highest performance on the original corpus. In a second step, after using the best NMT system, we semi-automatically designed and applied a number of rules to refine and improve the quality of the produced resource, creating a new corpus named WebNLG-IT. We used this resource for fine-tuning several LLMs for RDF-to-text tasks. In this way, comparing the performance of LLM-based generators on both Italian and English, we have (1) evaluated the quality of WebNLG-IT with respect to the original English version, (2) released the first fine-tuned LLM-based system for generating Italian from semantic web triples and (3) introduced an Italian version of a modular generation pipeline for RDF-to-text.
2025
ACL 2025: The 63rd Annual Meeting of the Association for Computational Linguistics
Vienna, Austria
2025
Findings of the Association for Computational Linguistics: ACL 2025
Association for Computational Linguistics
12073
12083
979-8-89176-256-5
https://aclanthology.org/2025.findings-acl.625/
Oliverio Michael, Balestrucci Pier Felice, Mazzei Alessandro, Basile Valerio
File in questo prodotto:
File Dimensione Formato  
2025.findings-acl.625.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 296.02 kB
Formato Adobe PDF
296.02 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2103841
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact