Electronics > FPGA

FPGA to accelerate fractal calculations

<< < (5/5)

Nominal Animal:
Because two of the three operations are actually squarings, you can do some useful optimizations.

For example, consider the Karatsuba algorithm, for two-limb numbers \$x = x_0 + x_1 B\$ and \$y = y_0 + y_1 B\$, whose product is
\begin{aligned} x y &= (x_0 + x_1 B)(y_0 + y_1 B) \\ ~ &= x_0 y_0 + (x_0 y_1 + x_1 y_0) B + x_1 y_1 B^2 \\ x y &= z_0 + z_1 B + z_2 B^2 \\ z_0 &= x_0 y_0 \\ z_1 &= x_0 y_1 + x_1 y_0 = (x_0 + x_1)(y_0 + y_1) - z_2 - z_0 \\ z_2 &= x_1 y_1 \\ \end{aligned}
where \$B\$ is the radix for each limb, typically some power of two, and the Karatsuba algorithm being the right hand side for \$z_1\$, trading one multiplication for three additions or subtractions.  Note that \$z_k\$ is double the size of \$x_k\$ and \$y_k\$.

Applying to \$x^2\$, we obviously have
\begin{aligned} x^2 &= X_0 + X_1 B + X_2 B^2 \\ X_0 &= x_0^2 \\ X_1 &= 2 x_0 x_1 \\ X_2 &= x_1^2 \\ \end{aligned}
which means we don't need to do the Karatsuba trade, and all three products can be calculated in parallel.

Karatsuba isn't useful when you have multiplications as fast as additions or subtractions, though, so it only becomes useful when you have more than two limbs per number.

Again, for both Mandelbrot and Julia fractals, for each iteration you need two squares and one product, plus some additions and subtractions, so there is quite a lot of mathematical optimizations possible.

ejeffrey:

--- Quote from: NiHaoMike on September 21, 2022, 03:54:07 am ---I'm under the impression that GPUs don't work very well with 64 bit numbers since that's overkill for most graphics work which is mostly 32 bit.
Roughly what sort of performance vs. bit width tradeoff should I expect for fixed point on a FPGA?

--- End quote ---

Yes most consumer GPUs target single precision float (8 bit exponent, 24 bit significand).  GPUs targeting hpc / scientific computing have lots of double precision MAC units because that is often needed in those applications.

FPGAs have fixed sized multipliers which depends on the platform.  Older FPGA usually had ~18 bit multipliers that are good for basic fixed point work but too small for transitional floating point formats by themselves.  Some newer FPGAs have bigger multipliers that can directly handle single precision floating point, but still nothing like 53/64 bit.  For that you need to combine several multipliers to compute partial products and sum them.  This eats up your dsp resources fast so generally you want to limit precision if at all possible.  Unfortunately fractals are like the definitive example of where there are no easy approximations.  If you want a high resolution render of a fractal you need precision arithmetic.

NiHaoMike:
I tried the mandelbrot_gpu script and my friend noticed that the details weren't as clear! Most likely because the GPU render mode is 32 bit due to most GPUs being poor at 64 bit.

How well would fixed point do in comparison?

Nominal Animal:

--- Quote from: NiHaoMike on September 24, 2022, 05:23:35 am ---I tried the mandelbrot_gpu script and my friend noticed that the details weren't as clear! Most likely because the GPU render mode is 32 bit due to most GPUs being poor at 64 bit.

How well would fixed point do in comparison?

--- End quote ---
How detailed an answer do you want?

Each iteration incurs rounding error, so unless you compare a floating-point and fixed-point implementations with the exact same rounding mode (and precision), the iteration counts outside the fractal set will vary.  The variation is smooth, not random.

The deeper you zoom into the set boundary, the more iterations are needed to determine whether the point is within the set or not, and thus higher precision is needed (to avoid the cumulative rounding error to become visible).

Precision, or size in bits, seems to be much more important than a floating point.  The iteration formula is such that there is very little benefit of having better precision near zero than elsewhere; it gets swamped by domain cancellation at the sum point anyway.

As an example, consider the case where you want to look at 0.3971+0.199i . With 'float', your resolution around there is about 0.0000001 per pixel, so you can generate a 1600×1200 image where the diagonal is about 0.0001 in the complex plane.  Any deeper than that, and neighboring points are rounded to the same complex number.  32-bit fixed point (Q3.28, actually) lets you zoom in 16x deeper than float, and 64-bit fixed point (Q3.60) lets you zoom in 60x deeper than double.

So, in simple terms, Q3.N-4 fixed point is better for at least Mandelbrot and Julia set calculations than N-bit floating point.  (I've actually tested this.)

ejeffrey:
Fraqtive also says it does some anti aliasing by oversampling. This could be part of the visual difference you see.  Did you try turning that off in fraqtive to see how it performs? But fundamentally calculating fractals accurately requires high resolution.  Nvidia desktop GPUs don't do this particularly well you need the Tesla line of data center GPUs to get good double precision performance.

How much faster is the gpu version than the CPU?