<< Chapter < Page | Chapter >> Page > |
The
elaborate
function modifies the class member variable `
ns
' at lines 10, 14, 18 and 21, where it appends a new node to the back of the list. After the function returns, the
ns
list represents a topological ordering of the computation with
CNodeLoad
and
CNodeBfly
nodes. The nodes of type
CNodeLoad
represent leaf computations: these computations load elements from the input array and perform a small amount of leaf computation, leaving the result in a set of registers. The
CNodeBfly
nodes represent computations in the body of the transform: these use a twiddle factor to perform a butterfly computation on a vector of registers, leaving the result in the same registers.
Type | Size | Addresses | Registers | Twiddle |
CNodeLoad |
4 | {0,4,2,6} | {0,1,2,3} | |
CNodeLoad |
2(x2) | {1,5,7,3} | {4,5,6,7} | |
CNodeBfly |
4 | {0,2,4,6} | ||
CNodeBfly |
4 | {1,3,5,7} |
The constructor for a
CNodeLoad
object computes input array addresses for the load operations using the input array offset (
ioffset
), the input array
stride
, the size of the node (the nodes instantiated at lines 9 and 17 are size-4, and the node instantiated at line 20 is size-2) and a final parameter that is non-zero if the node is a single node (the nodes instantiated at lines 17 and 20 are single nodes, while the node instantiated at line 9 is composed of two size-2 nodes).
As the newly instantiated
CNodeLoad
objects are appended to the back of
ns
at lines 10, 14 and 21, the
assign_leaf_registers
function assigns registers to the outputs of each instance. Registers are identified with integers beginning at zero, and when each register is created it is assigned an identifier from an auto-incrementing counter (
). This function also maintains a map of registers to node pointers, referred to as
rmap
, where the node for a given register is the last node to reference that register.
The constructor for a
CNodeBfly
object uses
k
and
stride
to compute a twiddle factor for the new instance of a butterfly computation node. When the new instance of
CNodeBfly
is appended to the end of
ns
at line 14, the
assign_body_registers
function assigns registers
to a node of size
with the following logic:
where
and
is the auto-incrementing register counter.
The
assign_body_registers
functions also updates the map of registers to node pointers by
setting
to point to the new instance of
CNodeBfly
.
void sfft_dcf8_hc(sfft_plan_t *p, const void *vin, void *vout) {
const SFFT_D *in = vin; SFFT_D *out = vout;
SFFT_R r0,r1,r2,r3,r4,r5,r6,r7;
L_4(in+0,in+8,in+4,in+12,&r0,&r1,&r2,&r3);
L_2(in+2,in+10,in+14,in+6,&r4,&r5,&r6,&r7);
K_0(&r0,&r2,&r4,&r6);
S_4(r0,r2,r4,r6,out+0,out+4,out+8,out+12); K_N(VLIT2(0.7071,0.7071),VLIT2(0.7071,-0.7071),&r1,&r3,&r5,&r7);
S_4(r1,r3,r5,r7,out+2,out+6,out+10,out+14); }
Hard-coded VL-1 size-8 FFT
Given a list of nodes, it is a simple process to emit C code that can be compiled to actually compute the transform.
The example in [link] would be emitted from the list of four nodes in [link] . Lines 1–4 are emitted from a function that generates a header, and line 13 is emitted from a function that generates a footer. Lines 6–11 are generated based on the list of nodes.
Notification Switch
Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?