c - Bit reverse reorder optimization using Intel intrinsics -
a while ago, optimized radix2 function using sse intrinsics , i'm close fftw performance, so, next step optimize bit reverse reorder function found original code, honest, , want optimize it. original code so:
void bit_reverse_reorder(float *x,float *y, int n) { int bits=0; int i, j, k; float tempr, tempi; //maxpow = 12 (i=0; i<maxpow; i++) if (pow_2[i]==n) bits=i; (i=0; i<n; i++) { j=0; (k=0; k<bits; k++) if (i&pow_2[k]) j+=pow_2[bits-k-1]; if (j>i) { tempr=x[i]; tempi=y[i]; x[i]=x[j]; y[i]=y[j]; x[j]=tempr; y[j]=tempi; } } } int main() { radix2(re,im,n); bit_reverse_reorder(re,im,n); }
ps: pow_2[] precomputed array containing powers of 2 (1,2,4,8,16,32,...), n number of element = 4096, *x , *y represents ,respectively, real , imaginary parts of each element of input data.
the radix2 generate results not ordered stated function reorder result.
first of all, did not understand how bit reversal works! so, think if gave me hints how function works.
second, intend use sse intrinsics enhance function performance there vector of 2 instructions used in swapping loop?
thanks @nwellnhof comment made modification swapping function following:
void bit_reverse_reorder(float *x,float *y, int n) { unsigned i,j; (i = 0, j = 0; < n; i++) { if (i < j) { float tmpx = x[i]; x[i] = x[j]; x[j] = tmpx; float tmpy = y[i]; y[i] = y[j]; y[j] = tmpy; } unsigned bit = ~i & (i + 1); unsigned rev = (n / 2) / bit; j ^= (n - 1) & ~(rev - 1); } }
and performance of 54 900 cycles for loop inside function :)
Comments
Post a Comment