Author Topic: Which FGPA/tool for this project?  (Read 15167 times)

0 Members and 1 Guest are viewing this topic.

Online mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13746
  • Country: gb
    • Mike's Electric Stuff
Re: Which FGPA/tool for this project?
« Reply #25 on: September 22, 2017, 06:56:28 pm »
As a quick & easy  reality check, it would probably worth getting your code running on a RasPi & see where you're at speed-wise.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline PartialDischargeTopic starter

  • Super Contributor
  • ***
  • Posts: 1611
  • Country: 00
Re: Which FGPA/tool for this project?
« Reply #26 on: September 22, 2017, 07:03:00 pm »
A fast MCU would be the best solution I agree. I need around 5-10 readings/sec, something that provides human interactivity. A laptop is too big and heavy, any laptop.
 

Online Marco

  • Super Contributor
  • ***
  • Posts: 6720
  • Country: nl
Re: Which FGPA/tool for this project?
« Reply #27 on: September 22, 2017, 07:56:22 pm »
Between 3x the clockspeed, dual issue and the instruction count reduction from going floating point I'd be surprised if a STM32F7 wouldn't get you your desired speedup.
 

Offline PartialDischargeTopic starter

  • Super Contributor
  • ***
  • Posts: 1611
  • Country: 00
Re: Which FGPA/tool for this project?
« Reply #28 on: September 22, 2017, 08:03:49 pm »
Between 3x the clockspeed, dual issue and the instruction count reduction from going floating point I'd be surprised if a STM32F7 wouldn't get you your desired speedup.
I'll take a look. Also people seem happy with Atollic
 

Offline technix

  • Super Contributor
  • ***
  • Posts: 3507
  • Country: cn
  • From Shanghai With Love
    • My Untitled Blog
Re: Which FGPA/tool for this project?
« Reply #29 on: September 23, 2017, 03:05:05 am »
If you can tolerate some latency you can use an ESP8266/ESP32/Raspberry Pi Zero W to stream the data off the system to a Wi-Fi network, to be processed using a laptop, a desktop PC, a gaming PC with one or two high-end GPUs, a server with a lot of high power x86 cores, or some cloud service like Amazon EC2. I don't think your calculations can hog up an 8-core Ryzen 7 and two GeForce GTX 1080 Ti's.

Latency I can tolerate. However the system must be light, portable, self-contained and battery operated  Sorry I can't disclose the application I've thought of ;)
Just stream it off to a server on the cloud for the number crunching then.
 

Offline Sal Ammoniac

  • Super Contributor
  • ***
  • Posts: 1670
  • Country: us
Re: Which FGPA/tool for this project?
« Reply #30 on: September 23, 2017, 05:32:38 am »
Also people seem happy with Atollic

If you don't mind that it's as slow as molasses in the winter.
Complexity is the number-one enemy of high-quality code.
 

Offline forrestc

  • Supporter
  • ****
  • Posts: 653
  • Country: us
Re: Which FGPA/tool for this project?
« Reply #31 on: September 23, 2017, 07:09:35 am »
Long time ago I did program a somewhat limited algorithm to locate the position of a magnet in space. Now I'd like to go further and port it to an FPGA.
I'm currently using a PSOC 5LP running at 67MHz(cortex M3).
The algorithm makes heavy use of 16-bit fixed point multiplications and sums for speed purposes, because I have multiple 4x4 matrix multiplications, jacobians, determinants, cosines, sines.... For example trygonometric functions are calculated by taylor series, to avoid slow floating point functions.

I agree with others that perhaps a fast core processor might be the best option, however, a few things others haven't mentioned specifically about FPGA's follow:

Note:  I haven't done much real work with a FPGA, so these are just things I've picked up knowing about somewhere along the way:

1)  There are C to HDL (aka fpga) compilers.   These will take something C like and interpret convert it to something like verilog or VHDL.   Not sure how well they work, I do know that for some you have to write in a 'special' way.   See https://www.xilinx.com/video/hardware/getting-started-vivado-high-level-synthesis.html as an example - just the first one I found.

2) For at least some of the functions you're talking about, there are cores out there (aka chunks of HDL code) which do a lot of the math functions you are talking about.   In particular XILINX has CORDIC IP which does a lot of the trig functions.   Others probably have similar or more or different.   

3) You may want to look at the math projects at opencores.org

 
The following users thanked this post: PartialDischarge

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Which FGPA/tool for this project?
« Reply #32 on: September 23, 2017, 10:27:32 am »
3) You may want to look at the math projects at opencores.org

yeah, the best advice ever (sarcasm)
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Which FGPA/tool for this project?
« Reply #33 on: September 23, 2017, 10:09:41 pm »
At Ultibo's gitgub, there is an example where a CPU is dedicated to a single task and the remaining 3 CPUs do whatever (RPi 3 in my case).

At the culmination of the example there is a simple assembly language loop to increment a value in memory while a task running elsewhere displays the iterations per second.  How about 603 MILLION loops per second? Sure, there is only an increment register and a memory write per loop but that's a LOT of loops!

Other Pascal loops are down around 7 MILLION per second which still might be sufficient.

And the RPi3 has floating point support.

https://github.com/ultibohub/Examples/tree/master/Advanced/DedicatedCPU/RPi2

I don't pretend to understand what is going on.  There's a lot to learn...

 

Offline PartialDischargeTopic starter

  • Super Contributor
  • ***
  • Posts: 1611
  • Country: 00
Re: Which FGPA/tool for this project?
« Reply #34 on: September 24, 2017, 05:38:50 am »
Assembly code is somewhere I don't want to go. It is a nightmare and the algorithm I'm handling sometimes fries my head.
Back in 1998 I did program a receiver-transmitter for a Direct Sequence Spread spectrum modulation in a TM3240C54x. I happen to have found the code of it. There is the taylor series for cos(x), sin(x), cos(4x), sin(4x), I used since theres is not enough memory for a look-up table and if you use 8-bits for that, the noise is too high for the demod operations.
The receiver used a complex I-Q demodulating loop



Code: [Select]

.title "Receptor DS-SS"
         .mmregs ;memory mapped registers
.setsect ".text",   0x200,0 ;Inicio del codigo ejecutable

.setsect "vectors", 0x180,0 ;Inicio del espacio para vectores

.setsect ".data", 0x2000,0

;-----definiciÛn de constantes

flag_rcv .set 0x0001 ; flag que se§aliza la recepci¢n de un dato
kv .set 0x0002 ; constante del VCO
phi .set 0x4F35 ; frecuencia discreta central del VCO
; phi=2*pi*fc*Ts (1.4Khz)
n_taps .set 15 ; numero de taps del filtro paso bajo del lazo

buff_length .set 15*9 ; longitud del buffer para el c¢digo PN
; usado en la rutina acq.
umbral_iq .set 17100 ; umbral para la rutina acq.

.copy "const.dat" ; Copia la secciÛn de constantes.
.sect "vectors"
.copy "vectors1.asm"

;-----inicio del programa principal

.text ; secciÛn de programa
start:
intm=1 ; no permito interrupciones.
call AC01INIT ; configura el conversor A/D.
pmst = #01a0h ; processor mode status register
sp = #27fah ; init stack pointer
imr = #240h ; unmask TDM RINT and HPIINT(host port interface)
intm = 0        ; globally enable all interrupts
sxm = 1 ; ExtensiÛn de signo activada.
*(flags)=#0
AR0 = #1 ; importante para las instrucciones MAC.

inicio TC=bitf(*(flags),flag_rcv) ; barrera -> se espera hasta que
  nop
if (NTC) goto inicio ; podamos procesar otro dato
*(flags) ^= #flag_rcv ; pone a cero el flag de recibido

T = *(samplerx) ; T = muestra recibida
nop
A = T * *(vcocout) ; multiplica la entrada por la salida
; del oscilador I.
*(f_data1) = hi(A) ; introduce la salida del
; multiplicador en el filtro I
T = *(samplerx)
nop
A = T * *(vcosout) ; multiplica la entrada por la salida
; del oscilador Q.
*(f_data2) = hi(A) ; introduce la salida del
; multiplicador en el filtro Q
call filtrar ; funci¢n de filtrado(I/Q)

A = *(lp1) ; Toma la salida del filtro I
*(b_2) = A ; y la introduce en buff2.
A = *(lp2) ; Toma la salida del filtro Q
*(b_3) = A ; y la introduce en buff3.
call acq1 ; rutinas que filtran con el cÛdigo PN local.
call acq2

*(update) = #0xffff ; Desactiva la seÒal de actualizaciÛn.
A = *(filt_out1) ; Suma en mÛdulo las salidas de los filtros
A = |A| ; que dan la correlaciÛn.
B = A
A = *(filt_out2)
A = |A|
A = A + B
*(i2q2) = A
A = A - #umbral_iq ; comprueba el umbral.
nop
if (ALT) goto no_update
nop
*(update) = #0x5000 ; seÒaliza la actualizaciÛn de la salida de los
; bloques acq1 y acq2.

no_update T = *(ultimo1) ; T = salida del bloque acq1.
  A = T * #0x7fff
B = *(ultimo2) * hi(A)
call vco

T = *(ultimo2)
*(sampletx) = T
goto  inicio


filtrar:
push(AR0) ; guarda AR0 en la pila

; filtro 1
AR0 = #f_data1_end
A = #0
repeat (#(n_taps-1))
macd(*AR0-,h0,A)
*(lp1) = hi(A)

; filtro 2
AR0 = #f_data2_end
A = #0
repeat (#(n_taps-1))
macd(*AR0-,h0,A)
*(lp2) = hi(A)

AR0 = pop() ; recupera AR0 de la pila
return_enable

vco: ;realiza la funci¢n del VCO
push(AR3)


AR3 = #vcomem
A = B << -6 ; entrada en B
T = #kv
A = T * hi(A)
A = A + #phi ; A = phi
A = A + dbl(*AR3) ;A = phi + entrada del VCO + valor de vcomem anterior
call modpi ;reduce 'A' a modulo pi
dbl(*AR3) = A ;guarda 'A' en 'vcomem'
A = A<<-2 ;desplazamiento a la derecha
;para dividir por cuatro el argumento
*(cosarg) = A
*(sinarg) = A
call coseno
call seno

AR3 = pop()
return_enable

modpi: ;reduce a modulo pi la variable 'vcomem'
push(AR1)
push(AR2)

AR1 = #dospi
AR2 = #pi

*(camsig) = #0
if (AGEQ) goto loop1
*(camsig) = #1
A = |A|
loop1 B = A
B = B - dbl(*AR2)
nop
nop
if (BLEQ) goto fin ;goto fin si -pi<= B <=pi
A = A - dbl(*AR1) ;si B esta fuera del rango resta 2*pi a 'A'.
goto loop1
fin TC= (*(camsig)==#1)
nop
nop
if (TC) execute(1) ;si cambiÈ el signo al principio,
A=-A ;volver a cambiar.

AR2=pop()
AR1=pop()
return_enable

coseno: ;calcula cos(4x) a partir de cos(x)
;Necesario ya que el argumento de la funci¢n cos() es vcomem
;dividido por 4.
call cos
A = *(cresult) * *(cresult)
A = A -#0x7fff <<16
A = T * hi(A)
nop
nop
A = T * hi(A)
A = A << -13
A = A + #0x7fff
*(vcocout) = A
return_enable

seno: ;calcula el sen(4x) a partir de sin(x).
call sin
A = #0
A = *(cresult) * *(cresult)
nop
A = A << 1
A = A - #0x7fff << 16
T = *(cresult)
A = T * hi(A)
nop
nop
T = *(sresult)
A = T * hi(A)
A = A << -14
*(vcosout) = A
return_enable

cos: ;calcula el coseno con la serie de Taylor
;argumento cosarg entre -1 rad y 1 rad.

push(AR2)
push(AR3)
push(AR4)

AR2 = #cosarg
AR3 = #c_coffs
AR4 = #C_1
A = *AR2+ * *AR2+
*AR2 = hi(A)
|| B = *AR4<<16 ;
A = B - *AR2+ * *AR3+
A = T * hi(A)
*AR2 = hi(A)
A = B - *AR2- * *AR3+
B = *AR2+ * hi(A)
*AR2 = hi(B)
  || B = *AR4<<16
A = B - *AR2- * *AR3+
A = A <<-1
A = -A
B = *AR2+ * hi(A)
B = B + *AR4 <<16
*AR2=hi(B)


AR4 = pop()
AR3 = pop()
AR2 = pop()
return_enable

sin: ; calcula en seno con la serie de Taylor
; argumento sinarg entre (-1 rad y 1 rad).
push(AR2)
push(AR3)
push(AR4)

AR2 = #sinarg
AR3 = #s_coffs
AR4 = #C_1
A = *AR2+ * *AR2+
*AR2 = hi(A)
|| B = *AR4<<16 ;
A = B - *AR2+ * *AR3+
A = T * hi(A)
*AR2 = hi(A)
A = B - *AR2- * *AR3+
B = *AR2+ * hi(A)
*AR2 = hi(B)
  || B = *AR4<<16
A = B - *AR2- * *AR3+
B = *AR2+ * hi(A)
*AR2 = hi(B)
  || B = *AR4<<16 ;
A = B - *AR2- * *AR3+
B = *(sinarg) * hi(A)
*(sresult) = hi(B)

AR4 = pop()
AR3 = pop()
AR2 = pop()
return_enable

acq1: ;rutina que correla con la secuencia PN local.

push(AR1) ;guarda en la pila AR1

AR1 = #b_2end ;retrasa las muestras de buff2
repeat(#(buff_length-1))
delay(*AR1-)

A = #0
repeat(#(buff_length-1)) ;calcula el producto de la entrada y del
;c¢digo local est·tico
macp(*AR1+, #b_1, A)
*(filt_out1) = hi(A)
A = *(update)
A = A - #0x9
nop
nop
if (ALT) goto end_acq1
nop
T = *(filt_out1)
*(ultimo1) = T
end_acq1
AR1 = pop()
return_enable

acq2: ;rutina que correla con la secuencia PN local.

push(AR1) ;guarda en la pila AR1

AR1 = #b_3end ;retrasa las muestras de buff2
repeat(#(buff_length-1))
delay(*AR1-)

A = #0
repeat(#(buff_length-1)) ;calcula el producto de la entrada y del
;cÛdigo local est·tico.
macp(*AR1+, #b_1, A)
*(filt_out2) = hi(A)
A = *(update)
nop
nop
if (ALT) goto end_acq2
T = *(filt_out2)
*(ultimo2) = T

end_acq2
AR1 = pop()
return_enable

transmit:
    B=trcv
*(samplerx)=B
*(sampletx) &= #0xfffc ;elimina los bits de control.
A=*(sampletx)
tdxr=A
*(flags) |= #flag_rcv ;seÒalizar que se ha recibido un dato
return_enable

.copy "ac01ini1.asm" ;configuraciÛn del conversor A/D.


.end
« Last Edit: September 24, 2017, 05:41:27 am by MasterTech »
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19494
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Which FGPA/tool for this project?
« Reply #35 on: September 24, 2017, 07:10:39 am »
Consider using one of these https://www.digikey.com/en/product-highlight/x/xmos/startkit which can be regarded as being halfway to being an FPGA

For a miserly £12 you get:
  • 8*100MIPs 32-bit cores, with some instructions specialised for DSP
  • boards transparently daisy-chainable if you need more cores
  • continue to program in C/C++
  • free development environment, Eclipse/LLVM/gdb
  • excellent flexible, fast, low latency FPGA-like I/O
  • USB comms

Other processors in the family go up to 4000MIPs, but not on that dev board.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 
The following users thanked this post: PartialDischarge

Online mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13746
  • Country: gb
    • Mike's Electric Stuff
Re: Which FGPA/tool for this project?
« Reply #36 on: September 24, 2017, 08:51:49 am »
Consider using one of these https://www.digikey.com/en/product-highlight/x/xmos/startkit which can be regarded as being halfway to being an FPGA

For a miserly £12 you get:
  • 8*100MIPs 32-bit cores, with some instructions specialised for DSP
  • boards transparently daisy-chainable if you need more cores
  • continue to program in C/C++
  • free development environment, Eclipse/LLVM/gdb
  • excellent flexible, fast, low latency FPGA-like I/O
  • USB comms

Other processors in the family go up to 4000MIPs, but not on that dev board.
Might be a solution, but would need the algorithm to be pipelined to make good use of all cores.
The OP is only looking for 5-10x the performance of a 67MHz Cortex M3.
It would make the most sense to try conventional MCU or DSP options before going to anything more exotic. It shouldn't take more than a day to get the code and I2C sensors running on a RasPi ( or similar) even if you'd never used Linux before. That would immediately give some definite numbers for speed and power consumption  to inform decisions on which way to go. 
Until that gets done there is little point spending time thinking about anything more exotic.
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19494
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Which FGPA/tool for this project?
« Reply #37 on: September 24, 2017, 11:13:49 am »
Consider using one of these https://www.digikey.com/en/product-highlight/x/xmos/startkit which can be regarded as being halfway to being an FPGA

For a miserly £12 you get:
  • 8*100MIPs 32-bit cores, with some instructions specialised for DSP
  • boards transparently daisy-chainable if you need more cores
  • continue to program in C/C++
  • free development environment, Eclipse/LLVM/gdb
  • excellent flexible, fast, low latency FPGA-like I/O
  • USB comms

Other processors in the family go up to 4000MIPs, but not on that dev board.
Might be a solution, but would need the algorithm to be pipelined to make good use of all cores.

I suspect that many navigation applications have some low-level noise-filtering grunt work, plus some "higher level" integration algorithms. I noted that there were several sensors, and guessed that the one-core-per-peripheral doing a lot of grunt work might be useful.

In addition, pure C doesn't have very useful DSP arithmetic modes, e.g. saturating arithmetic. Since the xCORE devices are aimed at DSP, their facilities might avoid losing performance when running C DSP. But I haven't investigated that, so it would be up to the OP to check my presumptions.

Without knowing the algorithm, it is impossible to say more.

Quote
The OP is only looking for 5-10x the performance of a 67MHz Cortex M3.
It would make the most sense to try conventional MCU or DSP options before going to anything more exotic. It shouldn't take more than a day to get the code and I2C sensors running on a RasPi ( or similar) even if you'd never used Linux before. That would immediately give some definite numbers for speed and power consumption  to inform decisions on which way to go. 
Until that gets done there is little point spending time thinking about anything more exotic.

Agreed. The xCORE devices are very nice w.r.t. precise guaranteed timing and w.r.t. bit-banged IO.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Which FGPA/tool for this project?
« Reply #38 on: September 24, 2017, 01:35:14 pm »
Assembly code is somewhere I don't want to go. It is a nightmare and the algorithm I'm handling sometimes fries my head.

Of course not!  But several million Pascal loops per second seems pretty impressive when you consider the slow rate coming from the sensors.

The idea of a dedicated 1.2 GHz processor with floating point just seems impressive.  The other 3 processors can deal with grabbing data and doing whatever with the output while the dedicated processor does nothing but crunch numbers.
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Which FGPA/tool for this project?
« Reply #39 on: September 24, 2017, 06:12:41 pm »
I decided to write a little Pascal program to see how the RPi3 handled floating point.  To my surprise, with the Free Pascal compiler, 64 bit is the default for type Real.  That's nice because the A53 processor defines 32 (check this) pairs of 32 bit registers to hold the values (d0..d31 are the reg names).  I haven't been able to find the answer on how many clocks it takes to perform a multiply but it can't be many because the processor, overall, is rated in the 2+ GFlop range.

It seems ARM doesn't produce timing specs for instructions on some processors because it gets swamped by outside factors like memory access, cache hits, and so on.

The processor also does short vector processing.  I didn't research this but it could be useful.

 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19494
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Which FGPA/tool for this project?
« Reply #40 on: September 24, 2017, 06:29:38 pm »
It seems ARM doesn't produce timing specs for instructions on some processors because it gets swamped by outside factors like memory access, cache hits, and so on.

Do you know of any current processors that have decent performance and do have such timing specs?

The only ones I'm aware of are the xCORE processors, which avoid needing interrupts and don't have caches.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline chickenHeadKnob

  • Super Contributor
  • ***
  • Posts: 1055
  • Country: ca
Re: Which FGPA/tool for this project?
« Reply #41 on: September 24, 2017, 09:58:54 pm »
It seems ARM doesn't produce timing specs for instructions on some processors because it gets swamped by outside factors like memory access, cache hits, and so on.

Do you know of any current processors that have decent performance and do have such timing specs?

The only ones I'm aware of are the xCORE processors, which avoid needing interrupts and don't have caches.

The Texas Instruments AM3358 or 3359 in the beaglebone series have 2 PRU units which run at 200Mhz and are deterministic if I recall correctly. Not surprising as they target the same type of problems  that xCORE or propeller cpus are intended for.
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19494
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Which FGPA/tool for this project?
« Reply #42 on: September 24, 2017, 11:10:39 pm »
It seems ARM doesn't produce timing specs for instructions on some processors because it gets swamped by outside factors like memory access, cache hits, and so on.

Do you know of any current processors that have decent performance and do have such timing specs?

The only ones I'm aware of are the xCORE processors, which avoid needing interrupts and don't have caches.

The Texas Instruments AM3358 or 3359 in the beaglebone series have 2 PRU units which run at 200Mhz and are deterministic if I recall correctly. Not surprising as they target the same type of problems  that xCORE or propeller cpus are intended for.

A quick scan of the TI AM335x shows the ARM-A8 has cache (of course), the PRU-ICSS have interrupts and 120(!) registers and 12k shared RAM and "limited" peripherals. I haven't assessed the effects those features have in practical systems, but they are orange flags I would want to investigate. The tools do code profiling, which is an orange flag.

The PRU-ICSS appear, in some subtle ways, appear to be regarded as bolt-ons to the ARM-A8. I'd prefer it to be the other way around!

In the past there have been many many asymmetric multicore processors, and the programming environment has always been an afterthought. Given that, I'm not entirely surprised there's little prominence given to how you program the hard real time parts, and the communications with the other cores. That's a shame, because the best hardware is useless without decent programming environments. I haven't spotted any instruction timings, nor part of the IDE that predicts worst-case timing - pointers would be welcome.

OTOH, the xCORE processors are symmetrical and have an excellent programming environment: xC which is based on Communicating Sequential Processes (also included in Go and Rust, apparently).
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline rstofer

  • Super Contributor
  • ***
  • Posts: 9890
  • Country: us
Re: Which FGPA/tool for this project?
« Reply #43 on: September 24, 2017, 11:57:44 pm »
Do you know of any current processors that have decent performance and do have such timing specs?

No, but given that these are register to register operations, it should be possible to determine how many cycles it takes to add or multiply.  Add is complicated due to alignment and would  probably be omitted or bounded.  I can't see any reason ARM couldn't describe the number of clocks required to multiply to reals.
 

Online NorthGuy

  • Super Contributor
  • ***
  • Posts: 3146
  • Country: ca
Re: Which FGPA/tool for this project?
« Reply #44 on: September 25, 2017, 12:30:41 am »
Other Pascal loops are down around 7 MILLION per second which still might be sufficient.

My old Sandy Bridge does about 2 billion Pascal loops per second with Delphi (fetching a value from a long array).

Code: [Select]
for i := 0 to N-1 do begin
  Buf[255] := char(WorkBuf[i]);
end;
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19494
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Which FGPA/tool for this project?
« Reply #45 on: September 25, 2017, 07:24:03 am »
Do you know of any current processors that have decent performance and do have such timing specs?

No, but given that these are register to register operations, it should be possible to determine how many cycles it takes to add or multiply.  Add is complicated due to alignment and would  probably be omitted or bounded.  I can't see any reason ARM couldn't describe the number of clocks required to multiply to reals.

What guarantees the variables are kept in registers? They probably are, but compiler optimisation algorithms are notoriously fickle and, um, "heuristic".

If the system performance is solely dependent on such an inner-loop, then fine. However in most systems detailed timing is dependent on other factors, e.g. interrupts, memory accesses in other parts of the codebase, etc, etc.

Floating point arithmetic performance is more or less impossible to guarantee, especially if IEEE754 is involved. Not only can operations be short-circuited, but operations involving denorm number are often notoriously slow - they often require fixups in software. What happens is very implementation dependent, and therefore highly non-portable.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Online mikeselectricstuff

  • Super Contributor
  • ***
  • Posts: 13746
  • Country: gb
    • Mike's Electric Stuff
Re: Which FGPA/tool for this project?
« Reply #46 on: September 25, 2017, 08:21:36 am »

What guarantees the variables are kept in registers? They probably are, but compiler optimisation algorithms are notoriously fickle and, um, "heuristic".
In principle you can use the register qualifier to tell the compiler which variables to optimise most, but how successful this is (or if you can even tell what it has done) will depend on the compiler
Youtube channel:Taking wierd stuff apart. Very apart.
Mike's Electric Stuff: High voltage, vintage electronics etc.
Day Job: Mostly LEDs
 

Offline tggzzz

  • Super Contributor
  • ***
  • Posts: 19494
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
Re: Which FGPA/tool for this project?
« Reply #47 on: September 25, 2017, 08:42:11 am »

What guarantees the variables are kept in registers? They probably are, but compiler optimisation algorithms are notoriously fickle and, um, "heuristic".
In principle you can use the register qualifier to tell the compiler which variables to optimise most, but how successful this is (or if you can even tell what it has done) will depend on the compiler

My understanding is that all non-trivial compilers have ignored the "register" hint for at least 30 years!
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline Someone

  • Super Contributor
  • ***
  • Posts: 4530
  • Country: au
    • send complaints here
Re: Which FGPA/tool for this project?
« Reply #48 on: September 25, 2017, 09:47:36 am »

What guarantees the variables are kept in registers? They probably are, but compiler optimisation algorithms are notoriously fickle and, um, "heuristic".
In principle you can use the register qualifier to tell the compiler which variables to optimise most, but how successful this is (or if you can even tell what it has done) will depend on the compiler
My understanding is that all non-trivial compilers have ignored the "register" hint for at least 30 years!
There are many compliers targeting embedded targets that strictly follow the C register keyword, especially when you want to mix C and assembly knowing the critical instructions and data.
 

Offline legacy

  • Super Contributor
  • ***
  • !
  • Posts: 4415
  • Country: ch
Re: Which FGPA/tool for this project?
« Reply #49 on: September 25, 2017, 09:53:25 am »
In the past there have been many many asymmetric multicore processors, and the programming environment has always been an afterthought

Which ones?
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf