Author Topic: vhdl, fast divider and fast multiplier, without instantiating DSP slices (Read 11459 times)

rstofer · « **Reply #25 on:** August 27, 2016, 11:12:06 pm »

Back to division: For restoring division, skipping over zeros saves a bit of time if shifting is essentially free. I haven't seen an article re: skipping zeros for non-restoring because we are delaying the correction for having subtracted erroneously.

legacy · « **Reply #26 on:** August 28, 2016, 02:08:10 pm »

Xilinx DSP48 has 24bit multiplying input
Spartan3 and Spartan6 are interesting

rstofer · « **Reply #27 on:** August 28, 2016, 04:22:46 pm »

Quote from: legacy on August 28, 2016, 02:08:10 pm

Xilinx DSP48 has 24bit multiplying input
Spartan3 and Spartan6 are interesting

http://www.xilinx.com/support/documentation/white_papers/wp277.pdf

Rasz · « **Reply #28 on:** August 28, 2016, 07:00:54 pm »

Quote from: rstofer on August 25, 2016, 04:53:54 pm

Look for SRT algorithms because they try to do multi-bit division using a lookup table.

This reminds me of 3DFX. Texture mapping requires division per pixel (perspective correction).
Gary Tarolli:

https://youtu.be/3MghYhf-GhU?t=45m

rstofer · « **Reply #29 on:** August 28, 2016, 09:55:40 pm »

Quote from: Rasz on August 28, 2016, 07:00:54 pm

Quote from: rstofer on August 25, 2016, 04:53:54 pm
Look for SRT algorithms because they try to do multi-bit division using a lookup table.

This reminds me of 3DFX. Texture mapping requires division per pixel (perspective correction).
Gary Tarolli:

https://youtu.be/3MghYhf-GhU?t=45m

A very interesting video. I bookmarked it for some time next week when I have more time.

One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer. Approximations can be good enough. This is not going to be the case for numeric processes.

Rasz · « **Reply #30 on:** August 28, 2016, 10:02:58 pm »

Quote from: rstofer on August 28, 2016, 09:55:40 pm

One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer. Approximations can be good enough. This is not going to be the case for numeric processes.

depends, its perfectly fine for neural networks
there is whole R&D movement for probabilistic computing
http://www.nature.com/news/modelling-build-imprecise-supercomputers-1.18437

hamster_nz · « **Reply #31 on:** August 28, 2016, 10:07:23 pm »

Quote from: rstofer on August 28, 2016, 09:55:40 pm

One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer. Approximations can be good enough. This is not going to be the case for numeric processes.

But isn't that the great thing about FPGAs? Unlike designing a CPU's division unit, with an FPGA you don't have to do things perfectly, only perfect enough to meet your needs. For example:

http://ieeexplore.ieee.org/document/7393152/

Accurate forecasts of future climate with numerical models of atmosphere and ocean are of vital importance. However, forecast quality is often limited by the available computational power. This paper investigates the acceleration of a C-grid shallow water model through the use of reduced precision targeting FPGA technology. Using a double-gyre scenario, we show that the mantissa length of variables can be reduced to 14 bits without affecting the accuracy beyond the error inherent in the model. Our reduced precision FPGA implementation runs 5.4 times faster than a double precision FPGA implementation, and 12 times faster than a multi-threaded CPU implementation. Moreover, our reduced precision FPGA implementation uses 39 times less energy than the CPU implementation and can compute a 100×100 grid for the same energy that the CPU implementation would take for a 29×29 grid.

rstofer · « **Reply #32 on:** August 28, 2016, 10:25:01 pm »

Quote from: hamster_nz on August 28, 2016, 10:07:23 pm

Quote from: rstofer on August 28, 2016, 09:55:40 pm
One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer. Approximations can be good enough. This is not going to be the case for numeric processes.
But isn't that the great thing about FPGAs? Unlike designing a CPU's division unit, with an FPGA you don't have to do things perfectly, only perfect enough to meet your needs. For example:

http://ieeexplore.ieee.org/document/7393152/

Accurate forecasts of future climate with numerical models of atmosphere and ocean are of vital importance. However, forecast quality is often limited by the available computational power. This paper investigates the acceleration of a C-grid shallow water model through the use of reduced precision targeting FPGA technology. Using a double-gyre scenario, we show that the mantissa length of variables can be reduced to 14 bits without affecting the accuracy beyond the error inherent in the model. Our reduced precision FPGA implementation runs 5.4 times faster than a double precision FPGA implementation, and 12 times faster than a multi-threaded CPU implementation. Moreover, our reduced precision FPGA implementation uses 39 times less energy than the CPU implementation and can compute a 100×100 grid for the same energy that the CPU implementation would take for a 29×29 grid.

Real world problems are often like that. You can't measure to as many places as you can calculate. It's pretty silly to carry around 15 digits after the decimal. Maybe I can measure with a micrometer to 4 fractional digits. And I'll still be off unless I am in an environment that mimics the calibration lab in terms of temperature.

OTOH, 4 / 2 still has to come out as 2. We don't tolerate approximations for integer arithmetic.

nctnico · « **Reply #33 on:** August 28, 2016, 11:02:16 pm »

Quote from: hamster_nz on August 28, 2016, 10:07:23 pm

Quote from: rstofer on August 28, 2016, 09:55:40 pm
One thing they brought up about division is that for 3D graphics, you don't have to have a perfect answer. Approximations can be good enough. This is not going to be the case for numeric processes.
But isn't that the great thing about FPGAs? Unlike designing a CPU's division unit, with an FPGA you don't have to do things perfectly, only perfect enough to meet your needs. For example:
http://ieeexplore.ieee.org/document/7393152/

Accurate forecasts of future climate with numerical models of atmosphere and ocean are of vital importance. However, forecast quality is often limited by the available computational power. This paper investigates the acceleration of a C-grid shallow water model through the use of reduced precision targeting FPGA technology. Using a double-gyre scenario, we show that the mantissa length of variables can be reduced to 14 bits without affecting the accuracy beyond the error inherent in the model. Our reduced precision FPGA implementation runs 5.4 times faster than a double precision FPGA implementation, and 12 times faster than a multi-threaded CPU implementation. Moreover, our reduced precision FPGA implementation uses 39 times less energy than the CPU implementation and can compute a 100×100 grid for the same energy that the CPU implementation would take for a 29×29 grid.

When reading this I can only think of one thing they missed: CUDA!

legacy · « **Reply #34 on:** August 29, 2016, 07:39:21 am »

Quote from: Rasz on August 28, 2016, 07:00:54 pm

Texture mapping requires division per pixel (perspective correction).

I wonder about Silicon Graphics/SGI (1)
what they implemented in hw

I know how fast it is, I own an IP30
a machine made by them in 2002

(1) today they are HP(e)

Hewlett Packard Enterprise bought SGI
for ~275 million of dollars


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: vhdl, fast divider and fast multiplier, without instantiating DSP slices (Read 11459 times)

rstofer

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

legacy

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

rstofer

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

Rasz

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

rstofer

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

Rasz

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

hamster_nz

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

rstofer

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

nctnico

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

legacy

Re: vhdl, fast divider and fast multiplier, without instantiating DSP slices

Share me