<< Chapter < Page | Chapter >> Page > |
Using the FFT approach, we “teach” a matrix to learn how to differentiate between genres. We did this byfeeding the database matrix with lots of samples of each genre and tell it what genre each sample is. Then we ask the matrix to make asmart decision about a given sample’s genre.
To create our database, we collected 10 songs for each genre (Rap, Rock, Jazz, Pop, and Classical). We firstconvert these songs from Stereo to Mono, which makes it easier for Matlab to carry. Using this we’ll get some long vectorsrepresenting each song. We divide each song into samples of length (15ms). Those short samples allow us to get a nice, identifiedpicture of the frequencies represented by those samples. We then take the FFT (Fast Fourier Transform) of each sample, and stackthem next to each other into the columns of our database matrix, with 661 rows. Each column represents the frequency spectrum ofeach 15ms. After that we normalize each vector respectively. We find out that Matlab would run out of memory really fast with hugeamount of information. An average song of 5 minute length, for example, would have 20,000 samples of 15ms. And each one would have661 rows, which gives us 13,220,000 numbers to represent one single song. We solved this problem by taking average of columns: weindeed average a 5 minute song into 75 columns. Although this certainly affected our algorithm’s accuracy by some degree, itallows us to add more songs to be able to build a bigger database matrix, thus compensate for the lost accuracy.
Now we label each column with a number, which represents the type of genre the column stands for (1-5). So eachnumber would correspond to a certain FFT column.
Notification Switch
Would you like to follow the 'Elec 301 projects fall 2006' conversation and receive update notifications?