In this paper we introduced the opportunities ofered by multimodal coordination and integration of multimedia elements with robot’speech, showing examples of their use in the context of robotto-human communication. In particular, we focused on the Pepper robot, a humanoid robot equipped with a tablet on its chest. The goal of this research was to formalise, implement, and experimentally evaluate various multimodal integration and coordination strategies, as: the coordination of the images to be displayed on the tablet’s screen within a spoken sentence, the modifcation of the spoken sentence pronunciation depending on the multimedia elements to be displayed, and the amount and size of these elements. Our main goal is to use multimodal communication to make the robot message more efective and comprehensible and to augment its communication possibilities combining voice, written text, and correlated images and animations. This approach has been tested by means of an online evaluation with 41 users. We simulated a robot-to-human communication by using prerecorded videos. This preliminary experiment gave some signifcant results regarding strategies related to the coordination between robot’s speech and multimedia appearances, word’s pronunciation and its relation to related image’s display, image’s display depending on modifers.

Multimodal Strategies for Robot-to-Human Communication

Massimo Donini;Cristina Gena;Alessandro Mazzei
2024-01-01

Abstract

In this paper we introduced the opportunities ofered by multimodal coordination and integration of multimedia elements with robot’speech, showing examples of their use in the context of robotto-human communication. In particular, we focused on the Pepper robot, a humanoid robot equipped with a tablet on its chest. The goal of this research was to formalise, implement, and experimentally evaluate various multimodal integration and coordination strategies, as: the coordination of the images to be displayed on the tablet’s screen within a spoken sentence, the modifcation of the spoken sentence pronunciation depending on the multimedia elements to be displayed, and the amount and size of these elements. Our main goal is to use multimodal communication to make the robot message more efective and comprehensible and to augment its communication possibilities combining voice, written text, and correlated images and animations. This approach has been tested by means of an online evaluation with 41 users. We simulated a robot-to-human communication by using prerecorded videos. This preliminary experiment gave some signifcant results regarding strategies related to the coordination between robot’s speech and multimedia appearances, word’s pronunciation and its relation to related image’s display, image’s display depending on modifers.
2024
ACM/IEEE International Conference on Human-Robot Interaction
Boulder, Colorado, USA
11-14 January 2024
19th Annual ACM/IEEE International Conference on Human Robot Interaction (HRI).At: Boulder, Colorado, USA
ACM
1
5
9798400703232
https://dl.acm.org/doi/10.1145/3610978.3640686
human robot interaction, social robotics, multimodal interaction, multimedia elements coordination, natural language interaction
Massimo Donini, Cristina Gena, Alessandro Mazzei
File in questo prodotto:
File Dimensione Formato  
3610978.3640686.pdf

Accesso aperto

Tipo di file: POSTPRINT (VERSIONE FINALE DELL’AUTORE)
Dimensione 1.95 MB
Formato Adobe PDF
1.95 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1952695
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact