<< Chapter < Page | Chapter >> Page > |
There are a number of sound signals that are of importance in the environment. In this project, we identify human speech as an important signal, that we would like to focus on. We would like to separate the environment sound signals into a signal containing only human speech and a signal containing all other speech. By making the above simplifications, the problem is reduced to finding the speech content in the surrounding environment, and forwarding the speech content to the listener while suppressing other signals. If there are no speech signals, or speech signals are weak compared to other signals, all sounds from the environment will be attenuated.
We approach the problem stated above by first separating the source signals into human speech content and non-human speech with a blind source separation algorithm. After that, a classification algorithm determines which signal contains human speech, if any, and outputs that signal. A detailed overview is shown below with the system block diagram in Figure 1.
The audio input, a mixture of human speech and some other noise such as instrumental music, is passed to the blind source separation block. Inside this block, short time Fourier transform is first performed to change the time domain input signal into frequency domain, since all the other operations on the signals reside in frequency domain. After that a preprocessing filter is applied, which cleans the input signal by removing reflection and ambient noises. Then for each frequency, the independent component analysis algorithm separates the signal into two parts such that independence between the two signals is maximized. Scaling and permutation filters minimizes distortions caused by using the independent component analysis method. The blind source separation process outputs two signals, where at most, one signal contains speech.
We then implement a binary artificial neural network (ANN) for classification.The two identical ANN take the two signals separated from the source as inputs respectively and outputs a weight for each signal. We train the neural network in the way that a signal containing more human speech will output a higher weight. Based on the weights of the separated signals, the output selection multiplexer chooses the proper signal to output. That is, if one signal, referred to as A, has a higher weight than the other signal, referred to as B, and its weight is above a certain threshold (0.5), the multiplexer outputs signal A. If both the weights of signal A and signal B are below that threshold, then the multiplexer outputs nothing. We set a threshold to deal with situation where both signal contain no human speech.
Notification Switch
Would you like to follow the 'Selective transparent headphone' conversation and receive update notifications?