<< Chapter < Page | Chapter >> Page > |
In contrast, unvoiced speech has more of a noise-like quality. Unvoiced sounds are usually much smaller in amplitude,and oscillate much faster than voiced speech. These sounds are generally produced by turbulence,as air is forced through a constriction at some point in the vocal tract. For example, an h sound comes from a constriction at the vocal cords, and an f is generated by a constriction at the lips.
An illustrative example of voiced and unvoiced sounds contained in the word “erase” are shown in [link] . The original utterance is shown in (2.1).The voiced segment in (2.2) is a time magnification of the “a” portion of the word.Notice the highly periodic nature of this segment. The fundamental period of this waveform, which is about 8.5 ms here,is called the pitch period . The unvoiced segment in (2.3) comes from the “s” soundat the end of the word. This waveform is much more noise-like than the voicedsegment, and is much smaller in magnitude.
Download the file start.au for the following sections. Click here for help on how to load and play audio signals .
For many speech processing algorithms, a very important step is to determine the type of sound that is being uttered in a given timeframe. In this section, we will introduce two simple methods for discriminatingbetween voiced and unvoiced speech.
Download the file
start.au ,
and use the
auread()
function to load
it into the Matlab workspace.
Do the following:
zoom xon
command
is useful for this).Circle the regions of the plot corresponding to these two segments
and label them as voiced or unvoiced.subplot()
command to plot the two signals,
VoicedSig and
UnvoicedSig on a single figure.Estimate the pitch period for the voiced segment. Keep in mind that these speech signals are sampled at 8 KHz, which means that the time betweensamples is 0.125 milliseconds (ms). Typical values for the pitch period are 8 ms for male speakers,and 4 ms for female speakers. Based on this, would you predict that the speaker is male, or female?
One way to categorize speech segments is to compute the average energy, or power.Recall this is defined by the following:
where is the length of the frame . Use [link] to compute the average energy of the voiced and unvoiced segments that you plotted above.For which segment is the average energy greater?
Another method for discriminating between voiced and unvoiced segments is to determine the rate at which the waveformoscillates by counting number of zero-crossings that occur within a frame (the number of times the signal changes sign).Write a function zero_cross that will compute the number of zero-crossings that occur within a vector, and apply this to the two vectors VoicedSig and UnvoicedSig . Which segment has more zero-crossings?
Notification Switch
Would you like to follow the 'Purdue digital signal processing labs (ece 438)' conversation and receive update notifications?