<< Chapter < Page | Chapter >> Page > |
To implement our neural network we used the Neural Network Toolbox in MATLAB. The neural network is built up of layers of neurons. Each neuron can either accept a vector or scalar input (p) and gives a scalar output (a). The inputs are weighted by W and given a bias b. This results in the inputs becoming Wp + b. The neuron transfer function operates on this value to generate the final scalar output a.
Our network used three layers of neurons, one of which is required by the toolbox. The final layer, output layer, is required to have neurons equal to the size of the output. We tested five accents, so our final layer has 5 neurons. We also added two "hidden" layers, which operate on the inputs before they are prepared as outputs, each of which have 20 neurons.
In addition to configuring the network parameters, we had to build the network training set. In our training set we had 42 speakers: 8 Northern, 9 Texan, 9 Russian, 9 Farsi, and 7 Mandarin. An accent profile was created for each of these speakers as discussed and compacted into a matrix. Each profile was a column vector, so the size was 42 x 28. For each speaker we also generated an answer vector. For example, the desired answer for a Texan accent is [0 1 0 0 0]. These answer vectors were also combined into an answer matrix. The training matrix and the desired answer matrix were given to the neural network which trained using traingda (gradient descent with adaptive learning rate backpropogation). We set the goal for the training function to be a mean square error of .005.
We originally configured our neural network to use neurons with a linear transfer function (purelin), however when using more than three accents at a time we could not reduce the mean square error to .005 The error approached a limit, which increased as the number of accents we included increased.
So, at this point we redesigned our network to use non-linear neurons (tansig).
After the network was trained we refined our set of training samples by looking at the network's output when given the training matrix again. We removed a handful of speakers to arrive at our present number of 42 because they included an accent we weren't explicitly testing for. These consisted of speakers who sounded as if they did not learn American English, but British English.
These final two figures show an image representation of the answer matrix and the answers given by the trained matrix. In the images, grey is 0 and white is one. Colors darker than grey represent negative numbers.
Notification Switch
Would you like to follow the 'Netcom' conversation and receive update notifications?