<< Chapter < Page Chapter >> Page >

Results:

Figures showing the efficacy of steps 1-4 (see Methods) are displayed below.

  1. The signal shows clear segmentation of numbers, between the red and green markers.
  2. Normalized spectrogram of an utterance of the number “1”. Formants visible but not distinct.
  3. By filtering the signal with a filter to the fifth or sixth exponential order, we distinctly emphasize the difference between the formants and the background.
  4. Weighted scatter plot overlaid with the contours of the maximum-likelihood GMM, showing the formants.
  5. Filtered spectrum on the mel scale, with a corner frequency of 700 Hz used.
  6. GMM as generated by the mel scale. Differs greatly from the linear-frequency GMM.

Testing this algorithm in Matlab with the generated input data of ten numbers resulted in a 70% accuracy match, vastly more successful than our attempt at linear prediction coding. However, while 70% is admittedly a decent result in the speech recognition field, one ought to remember that the system faces several important limitations (that were common to the LPC as well).

First, the system is trained by a limited sampling. While it is expected to hold to similar accuracy when tested against other male voices, it will be highly inaccurate when testing female voices. Second, segmentation has shown to work perfectly well with calm, enunciated speech, and recognition to a large degree. The same could not be said of more casual speech where numbers might be slurred or stuttered, or non-numerical noises inserted (i.e. “um” or “ah”). Similarly, some speakers might prefer to speak in terms of multiple digits - “seventy” instead of “seven-oh”, for instance. A more robust system would take these issues into account.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Elec 301 project: voice recognition. OpenStax CNX. Dec 19, 2011 Download for free at http://cnx.org/content/col11396/1.3
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Elec 301 project: voice recognition' conversation and receive update notifications?

Ask