The creation of artificial polyglot voices remains a challenging task, despite considerable progress in recent years. This paper investigates self-supervised learning for voice conversion to create native-sounding polyglot voices. We introduce a novel cross-lingual any-to-one voice conversion system that is able to preserve the source accent without the need for multilingual data from the target speaker. In addition, we show a novel cross-lingual fine-tuning strategy that further improves the accent and reduces the training data requirements. Objective and subjective evaluations with English, Spanish, French and Mandarin Chinese confirm that our approach improves on state-of-the-art methods, enhancing the speech intelligibility and overall quality of the converted speech, especially in cross-lingual scenarios. Audio samples are available at: https://giuseppe-ruggiero.github.io/a2o-vc-demo/

Enhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in Any-to-One Voice Conversion

Ruggiero, Giuseppe
First
;
Di Caro, Luigi
Last
2024-01-01

Abstract

The creation of artificial polyglot voices remains a challenging task, despite considerable progress in recent years. This paper investigates self-supervised learning for voice conversion to create native-sounding polyglot voices. We introduce a novel cross-lingual any-to-one voice conversion system that is able to preserve the source accent without the need for multilingual data from the target speaker. In addition, we show a novel cross-lingual fine-tuning strategy that further improves the accent and reduces the training data requirements. Objective and subjective evaluations with English, Spanish, French and Mandarin Chinese confirm that our approach improves on state-of-the-art methods, enhancing the speech intelligibility and overall quality of the converted speech, especially in cross-lingual scenarios. Audio samples are available at: https://giuseppe-ruggiero.github.io/a2o-vc-demo/
2024
Empirical Methods in Natural Language Processing
Miami, Florida
November
Findings of the Association for Computational Linguistics: EMNLP 2024
Association for Computational Linguistics
2237
2246
979-8-89176-168-1
https://aclanthology.org/2024.findings-emnlp.122/
Natural Language Understanding, Speech
Ruggiero, Giuseppe; Testa, Matteo; Walle, Jurgen Van De; Di Caro, Luigi
File in questo prodotto:
File Dimensione Formato  
2024.findings-emnlp.122.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 243.63 kB
Formato Adobe PDF
243.63 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/2049890
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact