Demonstrates the results of our vowel detection method
Our project produced largely successful results. We achieved flawless output for a variety of two syllable words that, as a whole, contained all of our database vowels. We were also successful with some three and four syllable words.
C represents a string of 1, 2, or more non-vowels and a,e,i,o,and u are the actual vowels detected. Also, "Me gusta Rich B" had to be parsed together.
'Biblioteca'and'Santiago'demonstrate superfluous consonant placement between vowels.
'Una'illustrates difficulty in vowel detection because the second formant in the vowel sound was not present.
'Loteria'and'Alejandro'demonstrate the errors caused by'R'and'L'respectively.
Problems
A relatively minor problem we encountered was the placement of consonants at the beginning and end of word, regardless of the beginning or ending sound being a consonant or vowel. A good example is the word Arturo, which begins and ends with a vowel sound, though our program returns a consonant at beginning and end. This is because of the dead space that is inherent at the start and end of file, due to the delay between recording beginning and the speech sample starting (and similarly at the end). The simplest way we could have amended this would have been to manually crop the files, so that no dead space was found.
Occasionally our vocal tract model did not sufficiently emphasize the second formant in'I'at a frequency far enough away from the third for there to be a peak at the frequency value we associated with the second one. As a result, the third formant was sometimes detected as the second. We never got this problem ironed out, and it caused confusion between I's and U's in our filter. A possible method of correcting this would be to apply a differentiator to adjacent frequency values of our frequency response. When the difference levels off or goes negative with a sufficiently high magnitude value, we could add that point as a formant peak. In the image below, one can visually tell that there is likely a peak around 1950 Hz, but there is no expressed peak, so our detection program passed over it.
· L's and R's were occasionally detected as vowels. This is due to the fact that the pronunciation of them is little different from that of vowels; they primarily rely on resonant frequencies from the vocal tract rather than restriction of airflow as other consonants. Below are some frequency responses of the vocal tract as L's and R's were being pronounced. As you can tell, they are highly similar to the frequency response of the vocal tract when vowels are being produced. Without drastically changing the focus of our project, the only method to amend this would be to have more intricate threshold values.
· Often in direct transition from vowel to vowel with no consonant between, a consonant value was returned between the two vowels. This can be seen below in the three images showing the transition from the second I in biblioteca to the'O'. The first image is the'I', the third is the'O', and the second is the transition between them. The transitional frequency response is not sufficiently similar to either the'I'or'O', so it gets classified as a consonant. Currently anything that does not match one of our five vowels gets classified as a consonant. A possible means of circumventing this would be to add a transitional character to our database, in this case and'IO'database character. Or we could have direct consonant recognition (a broad class, not specific consonants) and then classify vowels that don't match our database as unknowns, rather than just pooling them with consonants.<i-bib.fig>,<between.fig>,<o-bib.fig>