Author Topic: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?  (Read 1843 times)

0 Members and 1 Guest are viewing this topic.

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Quote
Let us now split each of X and Y into two subwords of k-bit each: X = 2^k*X1 + X0 and Y = 2^k*Y1 + Y0

X1 is the integer formed by the k most significant bits of X, and X0 is made of the k least significant bits of X. The product X × Y may be written X × Y = (2^k*X1 + X0) × (2^k*Y1 + Y0)

(1) X*Y = 2^k* X1 * Y1 + 2^k* (X1*Y0 + X0*Y1) + X0*Y0

The classical step of Karatsuba-Ofman algorithm is the following.

First compute DX = (X1 − X0) and DY = (Y1 − Y0). The results are signed numbers that fit on k + 1 bits in two’s complement1, then compute the product DX × DY using a DSP block. Now the middle term of equation (1), X1*Y0 + X0Y1, may be computed as:

(2) X1*Y0 + X0*Y1 = X1*Y1 + X0*Y0 − DX*DY

Then, the computation of X*Y using (1) only requires three DSP blocks: one to compute X1*Y1, one for X0*Y0, and one for DX*DY.

There is an overhead in terms of additions. In principle, this overhead consists of two k-bits subtractions for computing DX and DY, plus one 2k-bit addition and one 2k-bit subtraction to compute equation (2). There are still more additions in equation (1), but they also have to be computed by the classical multiplication decomposition and are therefore not counted in the overhead.

yesterday I asked this and got the above reply. Has anyone ever implemented it?
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?
« Reply #1 on: October 11, 2018, 02:42:47 pm »
I used it when I implemented SSL a long time ago, not on FPGA through. I had a regular version, an MMX version, and the super modern SSE2 version (although SSE2 didn't add much). For every one of these cases, I compared Karatsuba to simpler algorithms (diagonal or school-boy). It was slower for shorter multiplications. It only got better for longer multiplications, starting from 512x512. Its performance was even worse for squaring - the smallest size where it outperformed others was 1536x1536. The results were consistently the same for each of the 3 hardware versions.

I think it's the same in FPGA. If you use DSP blocks, an extra multiplication is less of a problem than multiple additions.

 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?
« Reply #2 on: October 11, 2018, 03:17:02 pm »
Yes I have, but only in software.

If you're willing to rely on DSP blocks anyway, is there a reason you don't want to use an IP generator? Xilinx's core generator can make multipliers for you up to 64x64.
I'm pretty sure most other FPGA vendors have a similar tool.
Granted it gives you no control over what's generated but it will get you there much faster.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?
« Reply #3 on: October 11, 2018, 04:18:25 pm »
is there a reason you don't want to use an IP generator?

of course, I have reasons. Too long story. Basically, I don't want to depend on the IP generator, and I prefer to work the simulator (edit: not the one provided by Xilinx) for which I have to model every logic involved in the projects.
« Last Edit: October 11, 2018, 04:28:13 pm by legacy »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?
« Reply #4 on: October 11, 2018, 04:29:44 pm »
Makes sense but you're still depending on the DSP tiles for the block multipliers if I got it right... so... how are you modeling those?
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?
« Reply #5 on: October 11, 2018, 06:32:01 pm »
how are you modeling those?

By a library of modules written in VHDL with constants to expresse features and I can express a module for a certain technology, e.g. Spartan6, or Spartan7.

 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14476
  • Country: fr
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: (FPGA) can a 32x32 multiply be done with three 16x16 multiplies?
« Reply #7 on: October 11, 2018, 08:12:48 pm »
thanks, so kind  :D
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf