In the field of assistive technologies, making accessible to visually impaired users complex visual content such as graphs or conceptual maps remains a significant challenge. This work proposes a modular dialog system that leverages a combination of neural Natural Language Understanding (NLU) and Retrieval-Augmented Generation (RAG) to translate graphical structures into meaningful text-based interactions. The NLU module combines a fine-tuned BERT classifier for intent recognition together with a spaCy-based Named Entity Recognition (NER) model to extract user intents and parameters. Moreover, the RAG pipeline retrieves relevant subgraphs and contextual information from a knowledge base, reranking and summarizing them via a language model. We evaluate the system across multiple specific tasks, achieving over 92% F1 in intent classification and NER, and demonstrate that even open-weight models, like DeepSeek-r1 or LLaMA-3.1, can offer competitive performance compared to GPT-4o in specific domains. Our approach enhances accessibility while maintaining modularity, interpretability, and performance on par with modern LLM architectures.

A Modular LLM-based Dialog System for Accessible Exploration of Finite State Automata

Stefano Vittorio Porta
;
Pier Felice Balestrucci
;
Michael Oliverio
;
Luca Anselma
;
Alessandro Mazzei
2025-01-01

Abstract

In the field of assistive technologies, making accessible to visually impaired users complex visual content such as graphs or conceptual maps remains a significant challenge. This work proposes a modular dialog system that leverages a combination of neural Natural Language Understanding (NLU) and Retrieval-Augmented Generation (RAG) to translate graphical structures into meaningful text-based interactions. The NLU module combines a fine-tuned BERT classifier for intent recognition together with a spaCy-based Named Entity Recognition (NER) model to extract user intents and parameters. Moreover, the RAG pipeline retrieves relevant subgraphs and contextual information from a knowledge base, reranking and summarizing them via a language model. We evaluate the system across multiple specific tasks, achieving over 92% F1 in intent classification and NER, and demonstrate that even open-weight models, like DeepSeek-r1 or LLaMA-3.1, can offer competitive performance compared to GPT-4o in specific domains. Our approach enhances accessibility while maintaining modularity, interpretability, and performance on par with modern LLM architectures.
2025
Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
Cagliari, Italy
2025, September
CLiC-it 2025: Eleventh Italian Conference on Computational Linguis- tics, September 24 — 26, 2025, Cagliari, Italy
CEUR Workshop Proceedings
1
9
https://ceur-ws.org/Vol-4112/85_main_long.pdf
Dialogue Systems, Retrieval-Augmented Generation, Large Language Models, Education
Stefano Vittorio Porta, Pier Felice Balestrucci, Michael Oliverio, Luca Anselma, Alessandro Mazzei
File in questo prodotto:
File Dimensione Formato  
85_main_long.pdf

Accesso aperto

Dimensione 1.43 MB
Formato Adobe PDF
1.43 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2104812
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact