<< Chapter < Page | Chapter >> Page > |
We have found the pitch contour, essentially the amplitude envelope of the Fourier transform. The peaks in this frequency response correspond to resonant modes of the human vocal tract. We want to analyze how the location and magnitude of these peaks relate to emotional content. Therefore, we need to find the peaks and valleys in each LPC slice of every sample.
Because the LPC produces a smooth pitch contour, we employed a simple first derivative maximum and minimum test. Implemented with a for-loop, this proves to be exceedingly slow when even a short phrase is divided into about 40 time windows, each with several thousand frequency bins. We vectorized this operation with offset matrices and logical indexing.
Maximum
Minimum
We grabbed the frequency location and magnitude of the first four formants and first three valleys (more than 90 percent of pitch contours contained at least this number of formants). We also calculated the mean and variance of power in the sample from the L2 norm. Finally, since all phrases were four syllables long, we used length as a measure of speed. These features were all passed to the neural network.
Notification Switch
Would you like to follow the 'Robust classification of highly-specific emotion in human speech' conversation and receive update notifications?