<< Chapter < Page | Chapter >> Page > |
Although the first
LDW
instruction do
not load the
A4
register correctly
while the
ADD
is executed, the
D1
functional unit becomes available
in the clock cycle right after the one in which
LDW
is executed.
To clarify the execution of instructions with delay slots,
let's think of the following example of the
LDW
instruction. Let's assume
A10 = 0x0100
A2=1
,
and your intent is loading
A9
with the
32-bit word at the address
0x0104
. The
3
MV
instructions are not related to
the
LDW
instruction. They do something
else.
1 LDW .D1 *A10++[A2], A92 MV .L1 A10, A8
3 MV .L1 A1, A104 MV .L1 A1, A2
5 ...
We can ask several interesting questions at this point:
A8
?
That is, in which clock cycle, the address pointer isupdated?A2
before the
LDW
instruction finishes the actual
loading?A10
before
the first
LDW
finishes loading the
memory content to
A9
? That is, can
we change the address pointer before the 4 delay slotselapse?LDW
instruction to load the memory
content to
A9
, the address pointer
and offset registers (
A10
and
A2
) are read and updated in the
clock cycle the
LDW
instruction is
issued. Therefore, in line 2,
A8
is
loaded with the updated
A10
, that
is
A10 = A8 = 0x104
.LDW
reads the
A10
and
A2
registers in the first clock cycle, you are free to
change these registers and do not affect the operationof the first
LDW
.Similar theory holds for
MPY
and
B
(when using a register as a branch
address) instructions. The
MPY
reads
in the source values in the first clock cycle and loads themultiplication result after the 2nd clock cycle. For
B
, the address pointer is read in the
first clock cycle, and the actual branching occurs after the5th clock cycle. Thus, after the first clock cycle, you are
free to modify the source or the address pointer registers.For more details, refer Table 3-5 in the instruction set
description or read the description of the individualinstruction.
There are several instructions for addition, subtraction and
multiplication on the C6x CPU. The basic instructions are
ADD
,
SUB
, and
MPY
.
ADD
and
SUB
have 0 delay slots (meaning the
results of the operation are immediately available), but the
MPY
has 1 delay slot (the result of the
multiplication is valid after an additional 1 clock cycle).
(Add, subtract, and multiply): Write an assembly program
to compute
( 0000 ef35h + 0000 33dch - 0000
1234h ) * 0000 0007h
Often you need to control the flow of the program execution
by branching to another block of code. The
B
instruction does the job in the C6x
CPU. The address of the branch can be specified either bydisplacement or stored in a register to be used by the
B
instruction. The
B
instruction has 5 delay slots,
meaning that the actual branch occurs in the 5th clock cycleafter the instruction is executed.
Notification Switch
Would you like to follow the 'Dsp lab with ti c6x dsp and c6713 dsk' conversation and receive update notifications?