<< Chapter < Page | Chapter >> Page > |
These peaks are known as formants . Thus, speech signal processors would say that the sound "oh" has a higherfirst formant frequency than the sound "ee," with being much higher during "ee." and (the second and third formants) have more energy in "ee" than in "oh." Rather than serving as a filter, rejecting high or lowfrequencies, the vocal tract serves to shape the spectrum of the vocal cords. In the time domain, we have a periodic signal, the pitch, serving as the input to alinear system. We know that the output—the speech signal we utter and that is heard by others and ourselves—willalso be periodic. Example time-domain speech signals are shown in [link] , where the periodicity is quite apparent.
From the waveform plots shown in [link] , determine the pitch period and the pitch frequency.
In the bottom-left panel, the period is about 0.009 s, which equals a frequency of 111 Hz. The bottom-rightpanel has a period of about 0.0065 s, a frequency of 154 Hz.
Since speech signals are periodic, speech has a Fourier series representation given by a linear circuit's response to a periodic signal . Because the acoustics of the vocal tract are linear, we knowthat the spectrum of the output equals the product of the pitch signal's spectrum and the vocal tract's frequency response. Wethus obtain the fundamental model of speech production .
The model spectrum idealizes the measured spectrum, and captures all the important features. The measured spectrum certainlydemonstrates what are known as pitch lines , and we realize from our model that they are due to the vocal cord'speriodic excitation of the vocal tract. The vocal tract's shaping of the line spectrum is clearly evident, but difficultto discern exactly, especially at the higher frequencies. The model transfer function for the vocal tract makes the formants much more readily evident.
The Fourier series coefficients for speech are related to the vocal tract's transfer function only at the frequencies , ; see previous result . Would male or female speech tend to have a more clearly identifiable formantstructure when its spectrum is computed? Consider, for example, how the spectrum shown on the right in [link] would change if the pitch were twice as high( ).
Because males have a lower pitch frequency, the spacing between spectral lines is smaller. This closer spacing moreaccurately reveals the formant structure. Doubling the pitch frequency to 300 Hz for [link] would amount to removing every other spectral line.
Notification Switch
Would you like to follow the 'Fundamentals of electrical engineering i' conversation and receive update notifications?