<< Chapter < Page | Chapter >> Page > |
American English can be described in terms of a set of about 42 distinct sounds called phonemes , illustrated in [link] . They can be classified in many ways according to their distinguishingproperties. Vowels are formed by exciting a fixed vocal tract with quasi-periodic pulses of air. Fricatives are produced by forcing air through a constriction (usually towards the mouth end of thevocal tract), causing turbulent air flow. Fricatives may be voiced or unvoiced. Plosive sounds are created by making a complete closure, typically at the frontal vocal tract, building up pressure behind the closureand abruptly releasing it. A diphthong is a gliding monosyllabic sound that starts at or near the articulatory position forone vowel, and moves toward the position of another. It can be a very insightful exercise to recite the phonemes shown in [link] , and make a note of the movements you are making to create them.
It is worth noting at this point that classifying speech sounds as voiced/unvoiced is not equivalent to thevowel/consonant distinction. Vowels and consonants are letters , whereas voiced and unvoiced refer to types of speech sounds . There are several consonants, /m/ and /n/ for example, which when spoken areactually voiced sounds.
As we have seen from previous sections, the properties of speech signals are continuously changing, but may be considered to be stationarywithin an appropriate time frame. If analysis is performed on a “segment-by-segment” basis, useful information about the constructionof an utterance may be obtained. The average energy and zero-crossing rate, as previously discussed, are examples of short-time featureextraction in the time-domain. In this section, we will learn how to obtain short-time frequency information from generally non-stationarysignals.
Download the file go.au for the following section.
A useful tool for analyzing the spectral characteristics of a non-stationary signal is the short-time discrete-time Fourier Transform , or stDTFT , which we will define by the following:
Here, is our speech signal, and is a window of length . Notice that if we fix , the stDTFT is simply the DTFT of multiplied by a shifted window. Therefore, is a collection of DTFTs of windowed segments of .
As we examined in the Digital Filter Design lab, windowing in the time domain causes an undesirable ringing in the frequency domain. This effectcan be reduced by using some form of a raised cosine for the window .
Write a function
X = DFTwin(x,L,m,N)
that will compute the
DFT of a windowed length
segment of the vector
.
Notification Switch
Would you like to follow the 'Purdue digital signal processing labs (ece 438)' conversation and receive update notifications?