Register
Unoptimized GPU FFT Print E-mail
User Rating: / 2
PoorBest 
Written by Stuart   
Friday, 30 April 2010 14:28

This is yet another FFT implementation for GPU. It's a basic building block for procedural textures and special effects.

Right now it is horribly unoptimized. I'm forcing myself to leave it alone until I get GPU profiles on the consoles, because I've got too much other work to do, and it's useful for visualization in its current state. It's hard though, because there are a lot of fun improvements to make:

  • The bit reversal can be folded into the body shader
  • The transpose passes can be removed
  • The butterfly passes can be doubled up, so that each shader call does two passes
  • The scalar real and imaginary textures can be swizzled into RGBA, so that each pass can work on 4 independent transforms
  • The first two butterfly passes are trivial and can be moved into a simpler shader
  • Some ALU improvements are certainly possible

But for now, it's good enough. So working backwards, the implementation looks like this:

    1 --- Perform a 2D FFT on a texture.

    2 --

    3 --  UNOPTIMIZED! Many of these shader passes can be combined, and there's

    4 --  no need to transpose everything just to do the vertical pass.

    5 --

    6 --  @param dest     Output image, complex values will be placed in RG

    7 --  @param source   Input image, complex values will be taken from RG

    8 --  @param dir      Direction, either 1 for FFT or -1 for inverse FFT

    9 --

   10 function DoFourierTransform2D( dest, source, dir )

   11 

   12     local width  = source.Width

   13     local height = source.Height

   14     local prec   = source.Precision

   15 

   16     if prec < 16 then

   17         prec = 16

   18     end

   19 

   20     local real = TextureCache:Alloc( height, width, 1, 1, prec )

   21     local imag = TextureCache:Alloc( height, width, 1, 1, prec )

   22 

   23     DoSplit2( real, imag, source );

   24 

   25     DoFourierBitReverse( real, imag, real, imag )

   26     DoFourierTransformHoriz( real, imag, real, imag, dir )

   27 

   28     DoTranspose( real, real )

   29     DoTranspose( imag, imag )

   30 

   31     DoFourierBitReverse( real, imag, real, imag )

   32     DoFourierTransformHoriz( real, imag, real, imag, dir )

   33 

   34     DoTranspose( real, real )

   35     DoTranspose( imag, imag )

   36 

   37     DoCombine2( dest, real, imag )

   38 

   39     TextureCache:Free( imag )

   40     TextureCache:Free( real )

   41 

   42 end

That routine splits a complex image into real/imaginary layers, does an FFT along the rows, does it again down the columns, and then recombines them. The butterfly passes are done here:

    1 --- Perform a 1D FFT on each row of a texture.

    2 --

    3 --  The real and imaginary values are held in separate textures. There is

    4 --  currently no benefit to that, but it will make optimization easier later.

    5 --

    6 --  @param destReal     Output real values

    7 --  @param destImag     Output imaginary values

    8 --  @param sourceReal   Output real values

    9 --  @param sourceImag   Output imaginary values

   10 --  @param dir          Direction, either 1 for FFT or -1 for inverse FFT

   11 --

   12 function DoFourierTransformHoriz( destReal, destImag, sourceReal, sourceImag, dir )

   13 

   14     local width     = sourceReal.Width

   15     local height    = sourceReal.Height

   16     local prec      = sourceReal.Precision

   17     local tempReal  = TextureCache:Alloc( width, height, 1, 1, prec )

   18     local tempImag  = TextureCache:Alloc( width, height, 1, 1, prec )

   19     local inReal    = sourceReal

   20     local inImag    = sourceImag

   21     local outReal   = tempReal

   22     local outImag   = tempImag

   23     local size      = 2

   24     local scale     = 1.0

   25 

   26     while size <= width do

   27 

   28         if size == width then

   29 

   30             outReal = destReal

   31             outImag = destImag

   32 

   33             if dir == 1 then

   34                 scale = 1.0 / width

   35             end

   36 

   37         end

   38 

   39         DoFourierBody( outReal, outImag, inReal, inImag, dir, size, scale )

   40 

   41         inReal = tempReal

   42         inImag = tempImag

   43 

   44         size = size * 2

   45 

   46     end

   47 

   48     TextureCache:Free( tempImag )

   49     TextureCache:Free( tempReal )

   50 

   51 end

Permuting into bit-reversed order is done using a lookup texture. It turns out that one texture can be used for any size FFT, as long as you're careful about exactly where you sample. So this shader is hardcoded to use a 1024-pixel wide lookup texture.

    1 SAMPLER_DECLARE( Real ) {};

    2 SAMPLER_DECLARE( Imag ) {};

    3 SAMPLER_DECLARE( BitRev ) {};

    4 

    5 extern float Width;

    6 extern float InvWidth;

    7 extern float Height;

    8 extern float InvHeight;

    9 

   10 float BitReverse( float x, float size )

   11 {

   12     float   scale   = exp2( 10 - log2( size ) );

   13     float   redir   = TEX2D( BitRev, float2( (x  * scale + 0.5) / 1024.0, 0 ) ).x;

   14 

   15     return( redir );

   16 }

   17 

   18 PS_BLIT_OUTPUT_2 PS( PS_BLIT_INPUT input )

   19 {

   20     float   x       = floor( input.mUV.x * Width );

   21     float   y       = floor( input.mUV.y * Height );

   22     float   u       = BitReverse( x, Width );

   23     float   v       = BitReverse( y, Height );

   24     float2  sampPos = float2( (u + 0.5) * InvWidth, input.mUV.y );

   25     float4  real    = TEX2D( Real, sampPos );

   26     float4  imag    = TEX2D( Imag, sampPos );

   27 

   28     RETURN_PS_BLIT_OUTPUT_2( real, imag );

   29 }

The texture itself is generated by script. It's not pretty, but it only has to run once.

    1 function ReverseBits( n, bitCount )

    2 

    3     local result    = 0

    4     local bitToSet  = 1

    5     local bitToTest = 2 ^ (bitCount - 1)

    6 

    7     for i = 1, bitCount do

    8 

    9         if n >= bitToTest then

   10             result = result + bitToSet

   11             n = n - bitToTest

   12         end

   13 

   14         bitToSet  = bitToSet * 2

   15         bitToTest = bitToTest / 2

   16 

   17     end

   18 

   19     return result

   20 

   21 end

   22 

   23 

   24 function CreateReverseBitsTexture( bitCount )

   25 

   26     local width     = 2 ^ bitCount

   27     local height    = 1

   28     local channels  = 1

   29     local t         = {}

   30 

   31     for y = 1, height do

   32         for x = 1, width do

   33             t[#t + 1] = ReverseBits( x - 1, bitCount )

   34         end

   35     end

   36 

   37     local tex = CTextureValue:New()

   38     tex:InitImmediate( width, height, channels, t )

   39 

   40     return tex

   41 

   42 end

The meat of the work is done below in the body shader.

    1 SAMPLER_DECLARE( Real ) {};

    2 SAMPLER_DECLARE( Imag ) {};

    3 

    4 extern float Dir;

    5 extern float PassSize;

    6 extern float Scale;

    7 extern float Width;

    8 extern float InvWidth;

    9 

   10 PS_BLIT_OUTPUT_2 PS( PS_BLIT_INPUT input )

   11 {

   12     int  x        = floor( input.mUV.xxxx * Width );

   13     int  size     = PassSize;

   14     int  half     = size / 2;

   15     int  size_ofs = x % size;

   16     int  half_ofs = x % half;

   17     bool is_even  = (half_ofs == size_ofs);

   18 

   19     float2 even_uv;

   20     float2 odd_uv;

   21 

   22     if( is_even )

   23     {

   24         even_uv = input.mUV.xy;

   25         odd_uv  = float2( input.mUV.x + (half * InvWidth), input.mUV.y );

   26     }

   27     else

   28     {

   29         even_uv = float2( input.mUV.x - (half * InvWidth), input.mUV.y );

   30         odd_uv  = input.mUV.xy;

   31     }

   32 

   33     float4 even_r = TEX2D( Real, even_uv );

   34     float4 odd_r  = TEX2D( Real, odd_uv );

   35     float4 even_i = TEX2D( Imag, even_uv );

   36     float4 odd_i  = TEX2D( Imag, odd_uv );

   37 

   38     float4 angle     = 2 * M_PI * Dir * half_ofs / size;

   39     float4 sin_angle = sin( angle );

   40     float4 cos_angle = cos( angle );

   41 

   42     float4 delta_r = (odd_r * cos_angle) - (odd_i * sin_angle);

   43     float4 delta_i = (odd_r * sin_angle) + (odd_i * cos_angle);

   44 

   45     odd_r = even_r - delta_r;

   46     odd_i = even_i - delta_i;

   47 

   48     even_r = even_r + delta_r;

   49     even_i = even_i + delta_i;

   50 

   51     float4 result_r = is_even? even_r : odd_r;

   52     float4 result_i = is_even? even_i : odd_i;

   53 

   54     RETURN_PS_BLIT_OUTPUT_2( result_r * Scale, result_i * Scale );

   55 }

And that's it. It's not efficient yet, but at least it's not much code. And there are a lot of cool things you can do in the frequency domain.

FFT example

I'll be posting some design information for the first game within the next week or so, and you can see where all this stuff is going...

 

Add comment

Security code
Refresh

Licensed developer

Licensed PS3 developer

Subscribe

Follow us on Twitter!

S5 Box

Register

*
*
*
*
*

Fields marked with an asterisk (*) are required.

Copyright © 2011 Pure Energy Games, Inc. All rights reserved.