<< Chapter < Page | Chapter >> Page > |
The matrix described is pictured graphically in the top graph of the figure above (cyan is zero, dark blue is negative, red is positive). Below that is a matrix that shows the two periods around these pitch markers (found by this path), which the pitch marker itself in the center of each column. As you can see, the peaks seem to move across the matrix in a straight line, meaning that when we overlap and add these segments, the peaks will be added on top of one another. This reduces phase problems with constructive and destructive interference between the peaks (which is why the algorithm is pitch-synchronous).
Having marked the boundaries of the regions to extract from the original signal, their new locations need to be defined (where they will end up in the output signal). A vector of new pitch markers is created, which begins with the first old pitch marker (found above), which is the phase offset, and then equally spaced at intervals equal to the desired fundamental period. For each new marker, the closest marker in the original signal is found and the two periods centered around that marker are Hanning windowed and copied to the output signal, centered about the new marker. Depending on whether the frequency is being raised or lowered, some pitch markers in the original signal may be used more than once, or not at all. The result of all this is a signal whose waveform retains the shape of the original, but has a shorter or longer period (depending on the amount of shift and in which direction). Hence, the pitch is shifted without altering the qualities of the voice that produced the sound.
This algorithm is based loosely on a paper written by Keith Lent from the University of Texas. As our project already had a separate component for pitch detection, many of the topics in the paper did not apply.
First, the first two periods of the original signal are located (using our knowledge of the detected frequency for the window). We then apply a Hanning window to these two periods and copy them at intervals of the new desired frequency. This is very similar to PSOLA, except that we do not place pitch markers throughout the original signal and locate the closest to our output. Instead, we always use the first two periods in the window and copy it centered on each new pitch marker, under the assumption that each period of the signal will be largely the same in a window that covers only a few milliseconds. Again, the result is a waveform with much the same shape as the original (at least in general) but a different period, and thus a modified fundamental frequency.
The figure presented below offers a visual comparison of these two algorithms. The graph on the left is about two periods from the original signal, whereas the graph on the right shows the output signal during the same time interval for both the PSOLA (red) and time-shifting algorithms (blue). By inspection, it should be clear that while both algorithms produce similar output, the PSOLA algorithm more closely resembles the shape of the original signal. An informal listening test confirms that the PSOLA algorithm sounds better.
Notification Switch
Would you like to follow the 'Ece 301 projects fall 2003' conversation and receive update notifications?