<< Chapter < Page | Chapter >> Page > |
Because the input array references between consecutive leaves are now linear, and like types of leaf sub-transforms are grouped together, it is now possible to compute several leaf sub-transforms in parallel, which is fully described in "Other vector lengths" .
The radix of the body sub-transforms can be increased in order to reduce the number of passes over the data and make better use of the cache. In practice, the body sub-transform radix is limited by the associativity of the cache as the size of the transform increases. If the radix is greater than the associativity of the nearest level of cache in which a sub-transform cannot fit, there will be cache misses for every iteration of the sub-transform's loop, resulting in severely degraded performance.
All Intel SIMD microprocessors since the Netburst micro-architecture have had at least 8-way associativity in all levels of cache, and thus increasing the radix from 4 to 8 is a sensible decision when targeting Intel machines.
Just as the split-radix 2/4 algorithm requires two different types of leaf sub-transforms, a split-radix 2/8 algorithm would require three, which increases the complexity of statically elaborating and generating code. There is an alternative that does not require implementing three types of leaf sub-transform: where a size- body sub-transform divides into a size body sub-transform and two size sub-transforms, the size and size sub-transforms may be collected together and computed as a size-8 sub-transform. Thus the transform is computed with two types of leaf sub-transform and two types of body sub-transform, instead of three types of leaf sub-transform and one type of body sub-transform, as with the standard split-radix 2/8 algorithm.
For the size-128 tranform in [link] , either the sub-transform at line 19 can be subsumed into the sub-transform at line 20, or the sub-transform at line 20 can be subsumed into the sub-transform at line 23 – but not both. The latter choice is better because it involves larger transforms.
CBody *CHardCodedLeaf::find_subsumable_sub_transform(
vector<CNode *>::reverse_iterator i) {
CBody *first = (CBody *)(*i); i++; while(i != bs.rend()) {
if(!((*i)->type().compare("body"))) {
CBody *second = (CBody *)(*i); if(first->N == second->N*2 && first->offset == second->offset){
bs.erase((++i).base()); return second;
} }
++i; }
return NULL;}
void CHardCodedLeaf::increase_body_radix(void) { vector<CNode *>::reverse_iterator ri;
for(ri=bs.rbegin(); ri!=bs.rend(); ++ri) { if(!((*ri)->type().compare("body"))) {
CBody *n1 = (CBody *)(*ri); CBody *n2 = find_subsumable_sub_transform(ri);
if(n2) n1->size *= 2;
} }
}
Doubling the radix of body sub-transforms
The code in
[link] iterates in reverse over a list of sub-transforms and doubles the radix of the body sub-transforms. Because the list may include multiple types, type introspection at lines 6 and 20 filters out all types that are not body sub-transforms. For each body sub-transform, the
increase_body_radix
function searches upwards through the list for a subsumable body sub-transform (using
find_subsumable_sub_transform
) and if a match is found, the smaller sub-transform is removed from the list, and the size of the larger sub-transform is doubled.
Notification Switch
Would you like to follow the 'Computing the fast fourier transform on simd microprocessors' conversation and receive update notifications?