Vocal Tract Modeling Techniques: From Human Voice to Non-Human Primates Vocalizations

Gamba, Marco; Torti, Valeria; Colombo, Camilla Marta Pedina; Giacoma, Cristina

The Source-Filter Theory (Fant, 1960) offered a powerful frame for the interpretation of human vocal production. Two main events take place in the human apparatus during vocal production. The first happens at a glottal level. Vocal fold vibration generates the fundamental frequency of a voice and its harmonics. These characteristics are known collectively as the voice source. The source is then modified by the shape and length of the vocal tract. The vocal tract acts as a resonator, adjusting the relative intensities of the frequencies of the source. The column of air vibrates in a complex manner that is influenced by the length and the shape of the vocal tract. One or several resonant frequencies of the vocal tract correspond to prominent spectral peaks called “formants”. The position and variation of the formants have been found to have a significant impact on the way humans recognize speech sounds. Even if a model of vocal production based on the relationship between the vocal tract area function and the formant output has been the most common framework for understanding speech production in humans, the study of vocal tract resonance in non-human primates has not comparably developed. This is probably due to several reasons. First, the study of formants in non-human primates started as an attempt to demonstrate that primates could not produce human speech sound. Once the pioneering studies of Lieberman and colleagues (Lieberman, 1968; 1969; Lieberman et al., 1969; 1972) showed that the anatomy and morphology of the non-human primate vocal tract prevented the production of human-like sounds, this field of investigation immediately ceased its activity. Some years later, the work of Andrew (1976) and then Hauser (Hauser et al., 1993; Hauser, 1996) brought back some attentions to the meaning of formants in primate intra-specific communication. In more recent years, a number of studies have shown that formant-based semantic communication is also present in non-human primates, for instance in Diana monkey alarm calls (Riede and Zuberbuhler, 2003 a, b; Rendall et al., 2005). These findings were strengthened when it was found that macaques could, without training, discriminate differences in the formant structure of their conspecific calls (Fitch and Fritz, 2006). Thus, is now widely accepted that the calls of many non-human primates, and mammals in general (Taylor and Reby, 2010), possess formants. A further extension of the importance of vocal tract filtering in primates is the application of computational models to describe their phonation processes. From an acoustic and physiological point of view, human vocal communication is far better known than any other mammal communication system, and techniques from speech science have often been applied to the study of vocal production in other mammals, especially non-human primates (Riede et al., 2005; Gamba and Giacoma, 2006). The purpose of this paper is to introduce a framework for future studies of the relation between vocal-tract shape and acoustics in human and non-human primates. Showing the potential of using vocal tract modeling in non-human primates, we highlight differences and similarities compared to vocal tract modeling in humans.

CINECA IRIS Institutional Research Information System