<< Chapter < Page | Chapter >> Page > |
Our program essentially breaks the time-domain signal into windows and computes the norm squared of the FFT of each window. It then averages the magnitude squared of the FFT coefficients of each window, then represents it in decibels. We then have a vector approximately length 100 that represents the power in the frequency domain. This is a measure of exactly what frequencies are present and at what magnitude. Rather than using a single number to characterize the whole signal, our power spectral density program returns a vector representing more subtle changes in the spectrum. The decibel scale helps distinguish and differentiate between genres even further, fanning out the differences between genres.
The power spectral density was great at showing patterns between genres. Rap has the most distinct pattern, with a sudden downward slope (red). Classical also had a distinctive pattern, with the smallest power at all frequencies. Jazz, punk, and country are all near each other, but at higher frequencies, begin to fan out. Looking closely at the envelopes, techno spans the largest area, encapsulating almost all of jazz, punk, and country. This is one reason why techno could not be distinguished very well from those genres.
Notification Switch
Would you like to follow the 'Music classification by genre' conversation and receive update notifications?