Author Topic: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies? (Read 1843 times)

legacy · « **on:** October 11, 2018, 01:02:55 pm »

Quote

Let us now split each of X and Y into two subwords of k-bit each: X = 2^k*X1 + X0 and Y = 2^k*Y1 + Y0

X1 is the integer formed by the k most significant bits of X, and X0 is made of the k least significant bits of X. The product X × Y may be written X × Y = (2^k*X1 + X0) × (2^k*Y1 + Y0)

(1) X*Y = 2^k* X1 * Y1 + 2^k* (X1*Y0 + X0*Y1) + X0*Y0

The classical step of Karatsuba-Ofman algorithm is the following.

First compute DX = (X1 − X0) and DY = (Y1 − Y0). The results are signed numbers that fit on k + 1 bits in two’s complement1, then compute the product DX × DY using a DSP block. Now the middle term of equation (1), X1*Y0 + X0Y1, may be computed as:

(2) X1*Y0 + X0*Y1 = X1*Y1 + X0*Y0 − DX*DY

Then, the computation of X*Y using (1) only requires three DSP blocks: one to compute X1*Y1, one for X0*Y0, and one for DX*DY.

There is an overhead in terms of additions. In principle, this overhead consists of two k-bits subtractions for computing DX and DY, plus one 2k-bit addition and one 2k-bit subtraction to compute equation (2). There are still more additions in equation (1), but they also have to be computed by the classical multiplication decomposition and are therefore not counted in the overhead.

yesterday I asked this and got the above reply. Has anyone ever implemented it?

NorthGuy · « **Reply #1 on:** October 11, 2018, 02:42:47 pm »

I used it when I implemented SSL a long time ago, not on FPGA through. I had a regular version, an MMX version, and the super modern SSE2 version (although SSE2 didn't add much). For every one of these cases, I compared Karatsuba to simpler algorithms (diagonal or school-boy). It was slower for shorter multiplications. It only got better for longer multiplications, starting from 512x512. Its performance was even worse for squaring - the smallest size where it outperformed others was 1536x1536. The results were consistently the same for each of the 3 hardware versions.

I think it's the same in FPGA. If you use DSP blocks, an extra multiplication is less of a problem than multiple additions.

SiliconWizard · « **Reply #2 on:** October 11, 2018, 03:17:02 pm »

Yes I have, but only in software.

If you're willing to rely on DSP blocks anyway, is there a reason you don't want to use an IP generator? Xilinx's core generator can make multipliers for you up to 64x64.
I'm pretty sure most other FPGA vendors have a similar tool.
Granted it gives you no control over what's generated but it will get you there much faster.

legacy · « **Reply #3 on:** October 11, 2018, 04:18:25 pm »

Quote from: SiliconWizard on October 11, 2018, 03:17:02 pm

is there a reason you don't want to use an IP generator?

of course, I have reasons. Too long story. Basically, I don't want to depend on the IP generator, and I prefer to work the simulator (edit: not the one provided by Xilinx) for which I have to model every logic involved in the projects.

SiliconWizard · « **Reply #4 on:** October 11, 2018, 04:29:44 pm »

Makes sense but you're still depending on the DSP tiles for the block multipliers if I got it right... so... how are you modeling those?

legacy · « **Reply #5 on:** October 11, 2018, 06:32:01 pm »

Quote from: SiliconWizard on October 11, 2018, 04:29:44 pm

how are you modeling those?

By a library of modules written in VHDL with constants to expresse features and I can express a module for a certain technology, e.g. Spartan6, or Spartan7.

SiliconWizard · « **Reply #6 on:** October 11, 2018, 07:10:02 pm »

I see.
You may find some ideas here: https://www.researchgate.net/publication/266678906_FPGA_Based_Modified_Karatsuba_Multiplier

legacy · « **Reply #7 on:** October 11, 2018, 08:12:48 pm »

thanks, so kind


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies? (Read 1843 times)

legacy

(FPGA) can a 32x32 multiply be done with three 16x16 multiplies?

NorthGuy

Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?

SiliconWizard

Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?

legacy

Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?

SiliconWizard

Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?

legacy

Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?

SiliconWizard

Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?

legacy

Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?

Share me