<< Chapter < Page | Chapter >> Page > |
How do we decide what parts of the spectrum are important? The CUIDADO project (2) provided a set of 72 audio features, and research1 has shown that some of the features aremore important in capturing the signal characteristics. We therefore decided to implement a small subset of thesefeatures:
Cepstral Features
Spectral Features
Cepstral coefficients have received a great deal of attention in the speech processing community, as they tryto extract the characteristics of the filter and model it independently of the signal being produced. This is ideal, as thefilter in our case is the instrument that we are trying to recognize. We work on a Mel scale because it more accurately modelshow the human auditory system perceives different frequencies, i.e. it gives more weight to changes at low frequencies as humans aremore adept at distinguishing low frequency changes.
The centroid correlates to the “brightness” of the sound and is often higher than expected due to the energyfrom harmonics above the fundamental frequency. The spread, skew, and kurtosis are based on the 2nd, 3rd, and 4th moments and, alongwith the slope, help portray spectral shape.
Odd-to-even harmonic energy ratio simply determines whether a sound consists primarily of odd harmonicenergy, of even harmonic energy, or whether the harmonic energy is equally spread.
The tristimulus measure energy as well and were introduced as the timbre equivalent to the color attributes ofvision. Like the OER, it provides clues regarding the distribution of harmonic energy, this time focusing on low, mid, and highharmonics rather than odd and even harmonics. This gives more weight to the first few harmonics, which are perceptually moreimportant.
MFCC have shown to work very well in monophonic environments, as they capture the shape of the spectrumvery effectively. Unfortunately, they are of less use in polyphonic recordings, as the MFCC captures the shape of a spectrum calculatedfrom multiple sources. Most of the work we have seen on this subject uses MFCC regardless, however. They are particularly usefulif only one instrument is playing or is relatively quite salient.
Most wind instruments have their harmonics evenly spread among the odd and even indices, but the clarinet isdistinct in that it produces spectra consisting predominantly of odd ratios, with very little even harmonics appearing at all. Thismakes sense from a physics standpoint, as when played, the clarinet becomes a closed cylinder at one end, therefore allowing only theodd harmonics to resonate. This feature was thus chosen primarily with clarinet classification in mind.
We chose the roll-off and tristimulus as our energy measures, as they were both easy to implement and judged tobe important (1) . Finally, the first four spectral moments and the spectral slope, in both perceptual and spectral models, were shownto be the top ten most important features in the same study and were therefore some of the first features added to our classificationsystem. We note that we had hoped to implement a perceptual model and thereby nearly double our features, but we could not find anaccurate filter model for the mid-ear and thus decided to forgo any features based on perceptual modeling.
For further discussion of these features, along with explicit mathematical formulas, please refer to (1) .
Notification Switch
Would you like to follow the 'Elec 301 projects fall 2005' conversation and receive update notifications?