Author Topic: vhdl, fast divider and fast multiplier, without instantiating DSP slices  (Read 11459 times)

0 Members and 1 Guest are viewing this topic.

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #25 on: August 27, 2016, 11:12:06 pm »
Back to division:  For restoring division, skipping over zeros saves a bit of time if shifting is essentially free.  I haven't seen an article re: skipping zeros for non-restoring because we are delaying the correction for having subtracted erroneously.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #26 on: August 28, 2016, 02:08:10 pm »
Xilinx DSP48 has 24bit multiplying input
Spartan3 and Spartan6 are interesting
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #27 on: August 28, 2016, 04:22:46 pm »
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2616
  • Country: 00
    • My random blog.
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #28 on: August 28, 2016, 07:00:54 pm »
Look for SRT algorithms because they try to do multi-bit division using a lookup table.

This reminds me of 3DFX. Texture mapping requires division per pixel (perspective correction).
Gary Tarolli:

https://youtu.be/3MghYhf-GhU?t=45m
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #29 on: August 28, 2016, 09:55:40 pm »
Look for SRT algorithms because they try to do multi-bit division using a lookup table.

This reminds me of 3DFX. Texture mapping requires division per pixel (perspective correction).
Gary Tarolli:

https://youtu.be/3MghYhf-GhU?t=45m

A very interesting video.  I bookmarked it for some time next week when I have more time.

One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer.  Approximations can be good enough.  This is not going to be the case for numeric processes.
 

Offline Rasz

  • Super Contributor
  • ***
  • Posts: 2616
  • Country: 00
    • My random blog.
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #30 on: August 28, 2016, 10:02:58 pm »
One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer.  Approximations can be good enough.  This is not going to be the case for numeric processes.

depends, its perfectly fine for neural networks
there is whole R&D movement for probabilistic computing
http://www.nature.com/news/modelling-build-imprecise-supercomputers-1.18437
Who logs in to gdm? Not I, said the duck.
My fireplace is on fire, but in all the wrong places.
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #31 on: August 28, 2016, 10:07:23 pm »
One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer.  Approximations can be good enough.  This is not going to be the case for numeric processes.
But isn't that the great thing about FPGAs? Unlike designing a CPU's division unit, with an FPGA you don't have to do things perfectly, only perfect enough to meet your needs. For example:

http://ieeexplore.ieee.org/document/7393152/


Accurate forecasts of future climate with numerical models of atmosphere and ocean are of vital importance. However, forecast quality is often limited by the available computational power. This paper investigates the acceleration of a C-grid shallow water model through the use of reduced precision targeting FPGA technology. Using a double-gyre scenario, we show that the mantissa length of variables can be reduced to 14 bits without affecting the accuracy beyond the error inherent in the model. Our reduced precision FPGA implementation runs 5.4 times faster than a double precision FPGA implementation, and 12 times faster than a multi-threaded CPU implementation. Moreover, our reduced precision FPGA implementation uses 39 times less energy than the CPU implementation and can compute a 100×100 grid for the same energy that the CPU implementation would take for a 29×29 grid.
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #32 on: August 28, 2016, 10:25:01 pm »
One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer.  Approximations can be good enough.  This is not going to be the case for numeric processes.
But isn't that the great thing about FPGAs? Unlike designing a CPU's division unit, with an FPGA you don't have to do things perfectly, only perfect enough to meet your needs. For example:

http://ieeexplore.ieee.org/document/7393152/


Accurate forecasts of future climate with numerical models of atmosphere and ocean are of vital importance. However, forecast quality is often limited by the available computational power. This paper investigates the acceleration of a C-grid shallow water model through the use of reduced precision targeting FPGA technology. Using a double-gyre scenario, we show that the mantissa length of variables can be reduced to 14 bits without affecting the accuracy beyond the error inherent in the model. Our reduced precision FPGA implementation runs 5.4 times faster than a double precision FPGA implementation, and 12 times faster than a multi-threaded CPU implementation. Moreover, our reduced precision FPGA implementation uses 39 times less energy than the CPU implementation and can compute a 100×100 grid for the same energy that the CPU implementation would take for a 29×29 grid.


Real world problems are often like that.  You can't measure to as many places as you can calculate.  It's pretty silly to carry around 15 digits after the decimal.  Maybe I can measure with a micrometer to 4 fractional digits.  And I'll still be off unless I am in an environment that mimics the calibration lab in terms of temperature.

OTOH, 4 / 2 still has to come out as 2.  We don't tolerate approximations for integer arithmetic.
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26906
  • Country: nl
    • NCT Developments
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #33 on: August 28, 2016, 11:02:16 pm »
One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer.  Approximations can be good enough.  This is not going to be the case for numeric processes.
But isn't that the great thing about FPGAs? Unlike designing a CPU's division unit, with an FPGA you don't have to do things perfectly, only perfect enough to meet your needs. For example:
http://ieeexplore.ieee.org/document/7393152/


Accurate forecasts of future climate with numerical models of atmosphere and ocean are of vital importance. However, forecast quality is often limited by the available computational power. This paper investigates the acceleration of a C-grid shallow water model through the use of reduced precision targeting FPGA technology. Using a double-gyre scenario, we show that the mantissa length of variables can be reduced to 14 bits without affecting the accuracy beyond the error inherent in the model. Our reduced precision FPGA implementation runs 5.4 times faster than a double precision FPGA implementation, and 12 times faster than a multi-threaded CPU implementation. Moreover, our reduced precision FPGA implementation uses 39 times less energy than the CPU implementation and can compute a 100×100 grid for the same energy that the CPU implementation would take for a 29×29 grid.

When reading this I can only think of one thing they missed: CUDA!
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline legacyTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices
« Reply #34 on: August 29, 2016, 07:39:21 am »
Texture mapping requires division per pixel (perspective correction).

I wonder about Silicon Graphics/SGI (1)
what they implemented in hw  :-//

I know how fast it is, I own an IP30
a machine made by them in 2002


(1) today they are HP(e)  :palm: :palm: :palm: :palm:
Hewlett Packard Enterprise bought SGI
for ~275 million of dollars 
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf