Products > Test Equipment
A High-Performance Open Source Oscilloscope: development log & future ideas
<< < (4/71) > >>
tom66:

--- Quote from: nctnico on November 16, 2020, 02:46:27 pm ---
--- Quote from: tom66 on November 16, 2020, 01:18:52 pm ---IMO there is little benefit in using a separate FPGA + SoC because you lose that close coupling that the Zynq has.  The ARM on the Zynq is directly writing registers on the FPGA side to influence acquisition, DMA behaviour etc.  That would have to fit over a SPI or small digital link, which would constrain the system considerably.

--- End quote ---
That is where PCIexpress comes in. This gives you direct memory access both ways; in fact the FPGA could push the acquired data directly into the GPU memory area using PCIexpress.

--- End quote ---

True, but the FPGA would still need to have some kind of management firmware on it for some parts,  for instance setting up DMA transfer sizes and trigger settings.  You could write that all in Verilog, but it becomes a real pain to debug.  The balance of CPU for easy software tasks and HDL for easy hardware tasks makes the most sense, and some of this stuff is low-latency so you ideally want to keep it away from a non-realtime system like Linux.  (The UltraScale SOC has a separate 600MHz dual ARM Cortex-R5 complex for realtime work - which is an interesting architecture.)  But, having the ability for the Pi to write and read directly from memory space on the Zynq side would be really compelling.  I may need to get the PCI-e reference manual and see what the interface and requirements look like there.
2N3055:
Very impressive work! I really hope you will succeed in your "quest"!
nctnico:

--- Quote from: tom66 on November 16, 2020, 02:48:35 pm ---
--- Quote from: nctnico on November 16, 2020, 02:46:27 pm ---
--- Quote from: tom66 on November 16, 2020, 01:18:52 pm ---IMO there is little benefit in using a separate FPGA + SoC because you lose that close coupling that the Zynq has.  The ARM on the Zynq is directly writing registers on the FPGA side to influence acquisition, DMA behaviour etc.  That would have to fit over a SPI or small digital link, which would constrain the system considerably.

--- End quote ---
That is where PCIexpress comes in. This gives you direct memory access both ways; in fact the FPGA could push the acquired data directly into the GPU memory area using PCIexpress.

--- End quote ---

True, but the FPGA would still need to have some kind of management firmware on it for some parts,  for instance setting up DMA transfer sizes and trigger settings.  You could write that all in Verilog, but it becomes a real pain to debug.  The balance of CPU for easy software tasks and HDL for easy hardware tasks makes the most sense, and some of this stuff is low-latency so you ideally want to keep it away from a non-realtime system like Linux.  (The UltraScale SOC has a separate 600MHz dual ARM Cortex-R5 complex for realtime work - which is an interesting architecture.)  But, having the ability for the Pi to write and read directly from memory space on the Zynq side would be really compelling.  I may need to get the PCI-e reference manual and see what the interface and requirements look like there.

--- End quote ---
The beauty of a PCI interface is that it basically does DMA transfers so Linux doesn't need to get in the way at all. The only thing the host CPU needs to do is setup the acquisition parameters and the FPGA can start pushing data into the GPU. Likely the GPU can signal the FPGA directly to steer the rate of the acquisitions. In the end a GPU has a massive amount of processing power compared to an ARM core for as long as you can do parallel tasks. I have made various realtime video processing projects with Linux and since all the data transfer is DMA based the host CPU is loaded by only a few percent. System memory bandwidth is something to be aware of though.
tom66:

--- Quote from: nctnico on November 16, 2020, 03:14:02 pm ---The beauty of a PCI interface is that it basically does DMA transfers so Linux doesn't need to get in the way at all. The only thing the host CPU needs to do is setup the acquisition parameters and the FPGA can start pushing data into the GPU. Likely the GPU can signal the FPGA directly to steer the rate of the acquisitions. In the end a GPU has a massive amount of processing power compared to an ARM core for as long as you can do parallel tasks. I have made various realtime video processing projects with Linux and since all the data transfer is DMA based the host CPU is loaded by only a few percent. System memory bandwidth is something to be aware of though.

--- End quote ---

It's a fair point. There's still some acquisition control that the FPGA needs to be involved in, for instance sorting out the pre- and post-trigger stuff.

The current architecture roughly works as such:
- Pi configures acquisition mode (ex. 600 pts pre trigger, 600 pts post trigger, 1:1 input divide, 1 channel mode, 8 bits, 600 waves/slice, trigger is this type, delay by X clocks, etc.)
- Zynq acquires these waves into a rolling buffer - the buffer moves through memory space so there is history for any given acquisition (~25 seconds with current memory)
- Pi interrupts before next VSYNC to get packet of waves (which may be less than the 600 waves request)
- Transfer is made by the Zynq over CSI - Zynq corrects trigger positions and prepares DMA scatter-gather list then my CSI peripheral transfers ~2MB+ of data with no CPU intervention

There is close fusion between the Zynq ARM, FPGA fabric, and the Pi - and since the Pi is not hard real time (Zynq ARM is running baremetal) you'd need to be careful there with what latency you introduce into the system.

It would be nice if we could say to the Pi, e.g. find waveforms at this address, and when the Pi snoops in to the PCIe bus, the FPGA fabric intercepts the request and translates each waveform dynamically so we don't have to do the pre-trigger rotation on the Zynq ARM.  Right now, the pre-trigger rotation is done by reading from the middle of the pre-trigger buffer, and then the start, then the post-trigger buffer (though I believe this could be simplified to two reads with some thought.)  Perhaps it's possible by using the SCU on the Zynq - it's got a fairly sophisticated address translation engine.  I'd like to avoid doing a read-rotate-writeback operation, as that triples the memory bandwidth requirements on the Zynq, and already 1GB/s of the memory bandwidth (~60%) is used just writing data from the ADC.  The Zynq ARM has to execute code and read/write data from this same RAM, and although the 512KB L2 cache on the Zynq is generous,  it's not perfect.
james_s:
I loathe touchscreens, I tolerate one on my phone because of obvious constraints with the form factor but while I've owned several tablets I've yet to find a really good use case for one other than looking at datasheets. Can't stand them on most stuff and it annoys me whenever someone points to something and makes a finger smudge on my monitor. I could potentially make an exception in the case of a portable scope to have in addition to my bench scope although I think in the case of this project my interest is mostly academic, it's a fascinating project and an incredible achievement but not something I'm likely to spend money on. Roughly the same price will get me a 4 channel Siglent in a nice molded housing with real buttons and knobs and support, or a used TDS3000 that can be upgraded to 500MHz. That said, I've heard that Digikey pricing on FPGAs is hugely inflated so you may be able to drop the cost down substantially.
Navigation
Message Index
Next page
Previous page
There was an error while thanking
Thanking...

Go to full version
Powered by SMFPacks Advanced Attachments Uploader Mod