<< Chapter < Page | Chapter >> Page > |
Then, why didn't the designer of the CPU make such that
LDW
instruction takes 5 clock cycles to
begin with, rather than let the programmer insert 4
NOPs
? The answer is that you can
insert other instructions other than
NOPs
as far as those instructions do
not use the result of the
LDW
instruction above. By doing this, the CPU can execute
additional instructions while waiting for the result of the
LDW
instruction to be valid, greatly
reducing the total execution time of the entire program.
In the C6x CPU, it takes exactly one CPU clock cycle to
execute each instruction. However, the instructions such as
LDW
need to access the slow external
memory and the results of the load are not availableimmediately at the end of the execution. This
delay of the execution results iscalled
delay slots .
For example, let's consider loading up the content of
memory content at address pointed by
A10
to
A1
and
then moving the loaded data to
A2
.
You might be tempted to write simple 2 line assembly codeas follows:
1 LDW .D1 *A10, A1
2 MV .D1 A1,A2
What is wrong with the above code? The result of the
LDW
instruction is not available
immediately after
LDW
is executed.
As a consequence, the
MV
instruction
does not copy the desired value of
A1
to
A2
. To prevent this undesirable
execution, we need to make the CPU wait until the resultof the
LDW
instruction is correctly
loaded to
A1
before executing the
MV
instruction. For load
instructions, we need extra 4 clock cycles until the loadresults are valid. To make the CPU wait for 4 clock
cycles, we need to insert 4
NOP
(no
operations) instructions between
LDW
and
MV
. Each
NOP
instruction makes the CPU idle
for one clock cycle. The resulting code will be likethis:
1 LDW .D1 *A10, A1
2 NOP3 NOP
4 NOP5 NOP
6 MV .D1 A1,A2
or simply you can write
1 LDW .D1 *A10, A1
2 NOP 43 MV .D1 A1,A2
Why didn't the designer of the CPU make such that
LDW
instruction takes 5 clock cycles to
begin with, rather than let the programmer insert 4
NOPs
? The answer is that you can
insert other instructions other than
NOPs
as far as those instructions do
not use the result of the
LDW
instruction above. By doing this, the CPU can execute
additional instructions while waiting for the result of the
LDW
instruction to be valid, greatly
reducing the total execution time of the entire program.
Description | Instructions | Delay slots |
Single Cycle | All instructions except following | 0 |
Multiply |
MPY, SMPY etc. |
1 |
Load |
LDB, LDH, LDW |
4 |
Branch |
B |
5 |
The functional unit latency indicates how many clock cycles each instruction actually uses afunctional unit. All C6x instructions have 1 functional unit latency, meaning that each functional unit is ready toexecute the next instruction after 1 clock cycle regardless of the delay slots of the instructions. Therefore, thefollowing instructions are valid:
1 LDW .D1 *A10, A4
2 ADD .D1 A1,A2,A3
Notification Switch
Would you like to follow the 'Dsp lab with ti c6x dsp and c6713 dsk' conversation and receive update notifications?