<< Chapter < Page | Chapter >> Page > |
A detail worth spending time on is how the speech data was packed into the ROM. An example of how it was done is shown in the following four figures (Figures 1 - 4). Figure 1 is an overview of the packing algorithm that I put together for a presentation on this topic. Figures 2 and 3 show the coded data for the word "cage". The coded data was taken from the information at the top of Figure 4. When the packing process was completed as shown in Figures 2 and 3, the resulting data matches the information in the bottom part of Figure 4. I am relatively certain that just using the four figures won't help much in understanding the process. So, I will pull excerpts from Figures 2 and 3 and use them to explain the process.
The top set of data in Figure 4 is the parametric data for the word "cage". The first column is the frame number, the second is the energy level the third column is the pitch period, and the remaining columns are the reflection coefficients going from K1 to K10 starting from the left going to the right. The bottom set of data is the final packed data for the encoded word.
I have taken the first five frames of data from Figures 2 and 3 and put them in Table 1. It will be easier to see the data and explain the process using this table rather than attempting to work through the hand written figures.
Frame | Energy | Rpt | Pitch | K1 | K2 | K3 | K4 | K5 | K6 | K7 | K8 | K9 | K10 |
8 | 1001 | 0 | 00000 | 10101 | 10110 | 0110 | 0110 | - | - | - | - | - | - |
9 | 0110 | 1 | 00000 | - | - | - | - | - | - | - | - | - | - |
10 | 0110 | 1 | 00000 | - | - | - | - | - | - | - | - | - | - |
11 | 1101 | 0 | 01010 | 10010 | 10000 | 0101 | 0101 | 0110 | 1011 | 1010 | 101 | 011 | 010 |
12 | 1101 | 1 | 01011 | - | - | - | - | - | - | - | - | - | - |
13 | 1101 | 0 | 01100 | 10110 | 10001 | 0111 | 0100 | 0000 | 1010 | 1011 | 110 | 100 | 011 |
Notice that frames 8 - 10 are unvoiced with frames 9 and 10 being repeated copies of frame 8. The "1" in frames 9 and 10 indicate that they are repeated frames. Frames 11 - 13 are voiced frames. Frame 12 is a repeat frame. Referring back to figure 1 you can see that an unvoiced frame (frame 8) only has the first four reflection coefficients (K1 - K4), where a voiced frame has all ten coefficients (frames 11 and 12). In all cases the repeat frame has no coefficients and the repeat flag is set to a "1".
The process consists of several steps
If I take the binary sequence for Frames 8 through 13 I get this sequence of bits:
1001 0 00000 10101 10110 0110 0110 . 0110 1 00000 . 0110 1 00000 . 1101 0 01010 10010 10000 0101 0101 0110 1011 1010 101 011 010 . 1101 1 01011 . 1101 0 01100 10110 10001 0111 0100 0000 1010 1011 110 100 011
Notice that I have inserted a "." to separate each of the frame sequences and have used a blank to separate the 13 parameters within each frame. The next task is to reformat the bits into hexadecimal. the bits for each hexadecimal number are shown in parenthesis below:
(1001) (0 000)(00 10)(101 1)(0110) (0110) (0110) . (0110) (1 000)(00 . 01)(10 1 0)(0000) . (1101) (0 010)(10 10)(010 1)(0000) (0101) (0101) (0110) (1011) (1010) (101 0)(11 01)(0 . 110)(1 1 01)(011 . 1)(101 0) (0110)(0 101)(10 10)(001 0)(111 0)(100 0)(000 1)(010 1)(011 1)(10 10)(0 011) [1011]
I have put brackets around the last nibble to indicate that it came from frame 14. It was necessary to create an even number of nibbles so that the process could be completed on this example. Now that the binary sequence has been organized into nibbles, I can use Table 2 to convert the nibbles into hexadecimal.
Decimal | Binary | Hexadecimal | Bit Reversed |
0 | 0000 | 0 | 0 |
1 | 0001 | 1 | 8 |
2 | 0010 | 2 | 4 |
3 | 0011 | 3 | C |
4 | 0100 | 4 | 2 |
5 | 0101 | 5 | A |
6 | 0110 | 6 | 6 |
7 | 0111 | 7 | E |
8 | 1000 | 8 | 1 |
9 | 1001 | 9 | 9 |
10 | 1010 | A | 5 |
11 | 1011 | B | D |
12 | 1100 | C | 3 |
13 | 1101 | D | B |
14 | 1110 | E | 7 |
15 | 1111 | F | F |
In hexadecimal it would look like: 90 2B 66 66 81 A0 D2 A5 05 56 BA AD 6D 7A 65 A2 E8 15 7A 3D
Bit reversed would look like: 90 4D 66 66 18 50 B4 5A 0A A6 D5 5B 6B E5 6A 54 71 8A E5 CB
Finally doing a pair wise nibble switch it would look like: 09 D4 66 66 81 05 4B A5 A0 6A 5D B5 B6 5E A6 45 17 A8 5E BC
If this sequence is compared to the bottom data set of Figure 4 it will be comforting to see them identical. Obviously we could have completed the whole word to verify that all of works. But, then, that is what Figures 2 and 3 attempted to do.
You may notice that I have ignored the creation of and use of the encode and decode tables. These tables were created based on a specific professional speaker. For each of the coefficients a test data set was used to reduce all of the variations to a set of buckets. For example with K1 where there are five bits to define the value of the coefficient, the data set was split into 32 buckets ranging from the largest to the smallest. A median point was selected to be the value used for the decoder. As this was specific to each professional speaker and therefore to each version of the TMS028x it will not be presented. That part of the process is left to the student to figure out. And, yes, you may have noted that I didn't disclose how the spelling of the words was packed into the ROM along with the speech data. Another aspect left to the student to figure out.
Notification Switch
Would you like to follow the 'The speak n spell' conversation and receive update notifications?