0.4 Streaming fft (Page 17/19)

Page 17 / 19

Sorted size-16 leaf nodes in size-128 hard-coded leaf FFT, grouped for VL-2

Size	Input array addresses
16	{0, 64, 32, 96, 16, 80, 112, 48, 8, 72, 40, 104, 120, 56, 24, 88}
16	{1, 65, 33, 97, 17, 81, 113, 49, 9, 73, 41, 105, 121, 57, 25, 89}
16	{2, 66, 34, 98, 18, 82, 114, 50, 10, 74, 42, 106, 122, 58, 26, 90}
8(x2)	{3, 67, 35, 99, 19, 83, 115, 51, 123, 59, 27, 91, 11, 75, 107, 43}
8(x2)	{4, 68, 36, 100, 20, 84, 116, 52, 124, 60, 28, 92, 12, 76, 108, 44}
8(x2)	{5, 69, 37, 101, 21, 85, 117, 53, 125, 61, 29, 93, 13, 77, 109, 45}
16	{126, 62, 30, 94, 14, 78, 110, 46, 6, 70, 38, 102, 118, 54, 22, 86}
16	{127, 63, 31, 95, 15, 79, 111, 47, 7, 71, 39, 103, 119, 55, 23, 87}

[link] shows the sorted size-16 leaf sub-transforms for a size-128 transform with the rows divided into VL-2 groups. Because each group of two leaf sub-transforms loads data from adjacent memory locations, the group of sub-transforms can be loaded in parallel with vector memory operations, and all (or some) of the computation done in parallel. The first, third and fourth groups in [link] contain leaf nodes of the same size/type; these are the easiest vector leaf sub-transforms to compute, as described in "Homogeneous leaf sub-transform vectors" . The second group of rows contains leaf sub-transforms of differing size/type, and computing these sub-transforms is covered separately in "Heterogeneous leaf sub-transform vectors" .

Homogeneous leaf sub-transform vectors

void
sfft_fcf128_shl16_8_ee(offset_t *is,const SFFT_D *in,SFFT_D *out){  SFFT_R r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14,r15;
  L_4(in+is[0],in+is[1],in+is[2],in+is[3],&r0,&r1,&r2,&r3);
  L_2(in+is[4],in+is[5],in+is[6],in+is[7],&r4,&r5,&r6,&r7);
  K_0(&r0,&r2,&r4,&r6);
  K_N(VLIT4(0.7071,0.7071,0.7071,0.7071),      VLIT4(0.7071,-0.7071,0.7071,-0.7071),
      &r1,&r3,&r5,&r7);
  L_4(in+is[8],in+is[9],in+is[10],in+is[11],&r8,&r9,&r10,&r11);
  L_4(in+is[12],in+is[13],in+is[14],in+is[15],&r12,&r13,&r14,&r15);
  K_0(&r0,&r4,&r8,&r12);
  K_N(VLIT4(0.9239,0.9239,0.9239,0.9239),      VLIT4(0.3827,-0.3827,0.3827,-0.3827),
      &r1,&r5,&r9,&r13);
  TX2(&r0,&r1); TX2(&r4,&r5); TX2(&r8,&r9); TX2(&r12,&r13);
  S_4(r0,r4,r8,r12,out0+0,out0+8,out0+16,out0+24);  S_4(r1,r5,r9,r13,out1+0,out1+8,out1+16,out1+24);
  K_N(VLIT4(0.7071,0.7071,0.7071,0.7071),      VLIT4(0.7071,-0.7071,0.7071,-0.7071),
      &r2,&r6,&r10,&r14);
  K_N(VLIT4(0.3827,0.3827,0.3827,0.3827),      VLIT4(0.9239,-0.9239,0.9239,-0.9239),
      &r3,&r7,&r11,&r15);
  TX2(&r2,&r3); TX2(&r6,&r7); TX2(&r10,&r11); TX2(&r14,&r15);
  S_4(r2,r6,r10,r14,out0+4,out0+12,out0+20,out0+28);  S_4(r3,r7,r11,r15,out1+4,out1+12,out1+20,out1+28);
}
Homogeneous size-16 leaf sub-transform for VL-2 size-128 hard-coded leaf FFT

The vector leaf sub-transforms of a single size/type are handled in the same way as a VL-1 sub-transform, with one difference: the vector registers must be transposed before the data is stored to memory in the output array. In the example shown in [link] , the transposes take place at lines 16 and 25.

Prior to the store operations, each position of the vector register (each position being a whole complex word) contains an element belonging to each of the leaf sub-transforms composing the vectorized sub-transform. Because each leaf sub-transform is stored sequentially to different locations in memory with aligned vector store operations, sets of registers are transposed such that each vector register contains elements from only one leaf sub-transform.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Computing the fast fourier transform on simd microprocessors. OpenStax CNX. Jul 15, 2012 Download for free at http://cnx.org/content/col11438/1.2

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?

Ask

	11 Sociology 11 Race and Ethnicity MCQ By OpenStax Start Quiz
	3 Psychology MCQ 2009 2 Exam By John Gabrieli Start Exam
	10 AP 10 Muscle Tissue MCQ Quiz By OpenStax Start Quiz
	Real Estate Finance Exam 2003 By Tod McGrath Start Exam
	English Composition 2 Final Practice By Madison Christian Start Test
	46 Biology 46 Ecosystems MCQ By OpenStax Start Quiz
©flickr:	Vocabulary Practice Quiz! By Katie Montrose Start Quiz
	Chemistry Practice Test By Sandhills MLT Start Test
	MAPEH Test G9 (Physical Education) By Angelica Lito Start Quiz
	Autonomic Nervous System By Marriyam Rana Start Quiz