<< Chapter < Page | Chapter >> Page > |
Then, why didn't the designer of the CPU make such that
LDW
instruction takes 5 clock cycles to
begin with, rather than let the programmer insert 4
NOPs
? The answer is that you can
insert other instructions other than
NOPs
as far as those instructions do
not use the result of the
LDW
instruction above. By doing this, the CPU can execute
additional instructions while waiting for the result of the
LDW
instruction to be valid, greatly
reducing the total execution time of the entire program.
The Table 3-5 in TI's instruction set description shows the
execution of the instructions with delay slots in moredetail. The instructions with delay slots are multiply
(
MPY
, 1 delay slot), the load
(
LDB, LDW
B
, 5
delay slots) instruction.
The functional unit latency indicates for how many clock cycles each instructions actually use afunctional unit. All C62x instructions have 1 functionalunit latency, meaning that each functional unit is ready to execute the next instruction after 1 clock cycle regardlessof the delay slots of the instructions. Therefore, the following instructions are valid:
1 LDW .D1 *A10, A4
2 ADD .D1 A1,A2,A3
Although the first
LDW
instruction do
not load the
A4
register correctly
while the
ADD
is executed, the
D1
functional unit becomes available
in the clock cycle right after the one in which
LDW
is executed.
To clarify the execution of instructions with delay slots,
let's think of the following example of
LDW
instruction. Let's assume
A10 = 0x0100
A2=1
,
and your intent is loading
A9
with the
32-bit word at the address
0x0104
. The
3
MV
instructions are not related to
the
LDW
instruction. They do something
else.
1 LDW .D1 *A10++[A2], A92 MV .L1 A10, A8
3 MV .L1 A1, A104 MV .L1 A1, A2
5 ...
We can ask several interesting questions at this point:
A8
?
That is, in which clock cycle, the address pointer isupdated?A2
before the
LDW
instruction finishes the actual
loading?A10
before
the first
LDW
finishes loading the
memory content to
A9
? That is, can
we change the address pointer before the 4 delay slotselapse?LDW
instruction to load the memory
content to
A9
, the address pointer
and offset registers (
A10
and
A2
) are read and updated in the
clock cycle the
LDW
instruction is
issued. Therefore, in line 2,
A8
is
loaded with the updated
A10
, that
is
A10 = A8 = 0x104
.LDW
reads the
A10
and
A2
registers in the first clock cycle, you are free to
change these registers and do not affect the operationof the first
LDW
.Similar theory holds for
MPY
and
B
(when using a register as a branch
address) instructions. The
MPY
reads
in the source values in the first clock cycle and loads themultiplication result after the 2nd clock cycle. For
B
, the address pointer is read in the
first clock cycle, and the actual branching occurs after the5th clock cycle. Thus, after the first clock cycle, you are
free to modify the source or the address pointer registers.For more details, refer Table 3-5 in the instruction set
description or read the description of the individualinstruction.
Notification Switch
Would you like to follow the 'Finite impulse response' conversation and receive update notifications?