<< Chapter < Page | Chapter >> Page > |
This helps get rid of some of the potential inaccuracy of our program, but how might we deal with inevitable background noise? We determined experimentally that the mean (absolute value of the) amplitude of any partition containing the speech content is orders of magnitude higher than parts that are just background noise. With that, we set two conditions that would automatically exclude any such noise from being in our consideration. As we analyze the signal in progressive chunks, we first look to see if the chunk has a max amplitude of at least 0.1 of the max amplitude of the signal. Then, we check to see that the mean of the magnitude of the chunk was greater than or equal to the mean of the signal. If one or more of these conditions were not met, we would immediately discard the current chunk and move on to the next chunk of signal.
We built the prototype by dividing our problem into five subproblems: initializing our data, reading an audio signal, determining the frequency response, cleaning up formant data and displaying information relevant to the vowels.
2) The user will speak slowly and clearly so that we can look for consistency in determining the vowels said.
2) Record an audio signal of n seconds from the user.
3) Split the data into non-overlapping chunks of 4000 samples.
4) Preprocess the data by detrending with a low-order polynomial and using a low-order Butterworth lowpass filter.
5) Determine the transfer function of the vocal tract associated with the current chunk using an ar model.
6) Determine the peaks of the transfer function (the formants) and match with the closest formant, filtering by a least means-squared matchedfilter.
7) If amplitude of signal exceeds 0.1 max amplitude of overall signal, we assume it's potentially a vowel. Put its formants onto the "stack" of recent_formant_pairs.
8) If the last four formant pairs have been consistent (the same value), then we will assume the vowel estimation has worked. Add the formant pair to the "stack" of vowels identified in tracktimes_and_formants.
9) Now we have processed our guesses for what vowels were said when. We are going to have repeats, so run through a for-loop to clean up thetrack_times_and_formants vector into a new, workable vector called track_begin_end_formants.
10) Output data and guesses.
2) User has to say each vowel for a moderate duration of time. Too long, and a wavering voice would affect success of the program. Too short, and not enough data to assign formants.
3) This program is calibrated with formant values averaged across all ages and genders. If a male has an exceptionally deep voice, or a child/female has an exceptionally high voice, they may not be able to get accurate vowel readings from the program. Accents may also affect accuracy of results.
Notification Switch
Would you like to follow the 'Vowel recognition using formant analysis' conversation and receive update notifications?