<< Chapter < Page | Chapter >> Page > |
This database was selected because it contained labeled emotional samples. The annotation files followed the general format
Start Stop A: Emotion,Phrase
Consequently, a parsing file was written that scanned each line and searched for 'A:'. If all sample descriptors could be identified successfully, the sample was added to the database providing that
With the database assembled, the files were read individually and each sample extracted one at a time.
Although the annotations were always accurate, they were not exactly precise. The padding of silence that surrounded each sample added bias to our feature vectors. Consequently, we developed a method to detect and remove this padding. We want to apply a threshold above which we deem the speaker to be active. Directly applying such a threshold to the raw signal is risky because of the inherent background noise. Thus, we first find the amplitude envelope. The Hilbert transform was applied to each sample. This returns the analytic signal. A lowpass FIR filter with cutoff frequency 20 Hz was implemented in MatLab. Finally, we applied a zero-phase filtering technique. This essentially filters the signal in the forward direction and then again in the reverse direction. Because the phase shift of a FIR filter is linear in the passband, the phase shift in each direction is cancelled. This produces an excellent amplitude envelope with no offset in the time domain.
analyt = hilbert(trace);
b = fir1(200,20/(fs/2),'low');
env = filtfilt(b,1,abs(analyt));
Finally, we applied a generous threshold, found the first and last crossings of that threshold, and deleted the silence padding.
Notification Switch
Would you like to follow the 'Robust classification of highly-specific emotion in human speech' conversation and receive update notifications?