| Products > Test Equipment |
| A High-Performance Open Source Oscilloscope: development log & future ideas |
| << < (58/71) > >> |
| nctnico:
I still think the best solution is to use PCI express and do all processing on a compute module (not necessarily CM4 but let's start there). Lecroy oscilloscopes are built this way as well; all waveform processing is done on the host platform. Doing processing inside the FPGA sounds nice for short term goals but long term you'll be shooting yourself in the foot because you can't extend the resources at all. At this point you already identify using the Zync as a bottleneck! Going the PCI express route also allows to use multiple FPGAs and extend the system to multiple channels (say 8 ) by just copying and pasting the same circuit. Heck, these could even be modules which plug into a special backplane with slots to distribute trigger, time synchronisation and PCI express signals. A standard PCI express switch chip aggregates all PCI express lanes from the acquisition units towards the compute module. Either way the next step should include moving all processing to a single point to achieve maximum integration. Having multiple processors at work and needing to communicate between them makes the system more difficult to develop and maintain in the end although it may seem like an easy way out right now (been there, done that). Which also circles back to defining the architecture first and then start to develop. BTW the external 10MHz reference will prove more complicated then you think. This will need to feed the clock synthesizer on the board directly as any FPGA based 'PLL' will add way too much jitter. |
| tom66:
--- Quote from: nctnico on January 01, 2021, 09:38:39 pm ---I still think the best solution is to use PCI express and do all processing on a compute module (not necessarily CM4 but let's start there). Lecroy oscilloscopes are built this way as well; all waveform processing is done on the host platform. Doing processing inside the FPGA sounds nice for short term goals but long term you'll be shooting yourself in the foot because you can't extend the resources at all. At this point you already identify using the Zync as a bottleneck! Going the PCI express route also allows to use multiple FPGAs and extend the system to multiple channels (say 8 ) by just copying and pasting the same circuit. Heck, these could even be modules which plug into a special backplane with slots to distribute trigger, time synchronisation and PCI express signals. A standard PCI express switch chip aggregates all PCI express lanes from the acquisition units towards the compute module. Either way the next step should include moving all processing to a single point to achieve maximum integration. Having multiple processors at work and needing to communicate between them makes the system more difficult to develop and maintain (been there, done that). --- End quote --- While I do agree that the CM4 (or whatever module is used) should do much of the processing, I disagree that it should do all of it. Certain aspects are beneficial for FPGA based logic, for instance digital filters using multiplier blocks would be some 10-50x faster if using the FPGA fabric, so it is a "no brainer" to do that. I think the same applies for waveform rendering. If I go the PCI-e route, then the Pi's software will be able to access any waveform. Address translation for trigger correction may even be performed on the Zynq side, if I am smart enough to be able to make that work - but if not doing it on the Pi would not be terribly difficult or resource intensive. There is no 'foot shooting' here because the CM4 would have access to whatever memory is needed so it can do as little or as much processing as it wants. It would just set up a task queue for what waveform buffers need to be rendered and pull the image out over PCI-e when it gets an interrupt, for example. There's also the bidirectional aspect so the Pi could load arbitrary code into the Zynq (or even a new bitstream!) via the fast PCI-e link, or send waveforms for DDS functions or for other DSP processing. IIRC the PCI-e on the Pi CM4 is 5Gb/s x1, and the Zynq supports up to x4, Ultrascale goes up to x16. --- Quote from: nctnico on January 01, 2021, 09:38:39 pm ---BTW the external 10MHz reference will prove more complicated then you think. This will need to feed the clock synthesizer on the board directly as any FPGA based 'PLL' will add way too much jitter. --- End quote --- Only if we used the FPGA PLLs to do anything with the clock signal - I was merely suggesting routing the logic signal through combinatorial blocks and IO which at 10MHz should be OK. The biggest risk is it will introduce noise which will worsen the jitter and phase noise, but this might not be that significant. However, external switches to route the signals in the analog domain (or low jitter digital domain) are another option, they just add more logic and cost. |
| nctnico:
--- Quote from: tom66 on January 01, 2021, 10:30:11 pm --- --- Quote from: nctnico on January 01, 2021, 09:38:39 pm ---I still think the best solution is to use PCI express and do all processing on a compute module (not necessarily CM4 but let's start there). Lecroy oscilloscopes are built this way as well; all waveform processing is done on the host platform. Doing processing inside the FPGA sounds nice for short term goals but long term you'll be shooting yourself in the foot because you can't extend the resources at all. At this point you already identify using the Zync as a bottleneck! Going the PCI express route also allows to use multiple FPGAs and extend the system to multiple channels (say 8 ) by just copying and pasting the same circuit. Heck, these could even be modules which plug into a special backplane with slots to distribute trigger, time synchronisation and PCI express signals. A standard PCI express switch chip aggregates all PCI express lanes from the acquisition units towards the compute module. Either way the next step should include moving all processing to a single point to achieve maximum integration. Having multiple processors at work and needing to communicate between them makes the system more difficult to develop and maintain (been there, done that). --- End quote --- While I do agree that the CM4 (or whatever module is used) should do much of the processing, I disagree that it should do all of it. Certain aspects are beneficial for FPGA based logic, for instance digital filters using multiplier blocks would be some 10-50x faster if using the FPGA fabric, so it is a "no brainer" to do that. I think the same applies for waveform rendering. --- End quote --- There is no need to do filtering and/or rendering inside the FPGA. First of all you don't need to filter and render the entire record but only enough to fill the screen. Being able to adjust the filter parameters after an acquisition is a big plus and I don't see that being possible if the FPGA filters the data from the ADC. Secondly, I understand when you say that the processor also has access to the data but that implies that at some point some operations will happen in the FPGA and some in software. But this means you can't tie them together in an easy way. Say the processor wants the filtered data + the original to do protocol decoding. You'll need to tell the FPGA to deliver the filtered data AND the original data will need to be fetched from the memory. And what if someone wants to extend the filters but the FPGA doesn't support that? In software it is easy to implement a 9th order filter; an FPGA implementation is much more rigid so the person likely ends up giving up or re-implementing filtering in software anyway leaving the FPGA implementation abandoned. Or there is a need to do something with the data in software before filtering which requires to feed the data into the FPGA for filtering and then retrieving it. You have to choose where the processing takes place because whether it is rendering or processing it has to be possible to insert/delete blocks (which perform an operation) into the processing chain in an easy way. It is either FPGA or software. There is no AND because otherwise there will be two different places where a 'product' is being made. Like having half of a car assembled in the UK and the other half in France. It doesn't make sense from an architectural point of view. Thirdly a reasonable GPU offers a boatload of processing power; probably even more than the FPGA can do and with a lot less effort to get it going. Remember that none of the respondents of your survey listed FPGA development as their profession; this means that the FPGA's role needs to be as minimal as possible to allow as many people as possible to participate. IMHO the FPGA should only do these functions: - format & buffer (FIFO) the data from the ADC so it can be stored in the acquisition memory - run the trigger engine The trigger engine is already complicated enough if it needs to support protocol triggering (and I like the idea of logic analyser like state machine triggering). I'm not ruling out that the FPGA can play a role in data processing but that will be a carefully considered optimisation which will fit with the architecture of the rest of the system. Implementing data processing inside the FPGA now is optimising before knowing the actual bottlenecks. PS: I didn't fill in the survey |
| gf:
Where are down-sampling/decimation/ERES/peak-detect supposed to be done when data are supposed to be stored at a lower sampling rate (say only 100kSa/s) than the maximum ADC rate (in order that a longer time interval fits into the memory)? [ Assume, I want to acquire a single buffer with 100M samples at 100kSa/s (i.e. 1000 seconds). In this case it is no longer feasible to dump 1000s of data @1GSa/s to memory first, and decimate in the post-processing. ] Any trigger filters (in front of the comparators) also need to be applied in the FPGA. |
| nctnico:
--- Quote from: gf on January 02, 2021, 12:21:21 am ---Where are down-sampling/decimation/ERES/peak-detect supposed to be done when data are supposed to be stored at a lower sampling rate (say only 100kSa/s) than the maximum ADC rate (in order that a longer time interval fits into the memory)? [ Assume, I want to acquire a single buffer with 100M samples at 100kSa/s (i.e. 1000 seconds). In this case it is no longer feasible to dump 1000s of data @1GSa/s to memory first, and decimate in the post-processing. ] --- End quote --- Peak-detect and decimation will need to be done inside the FPGA. But these are due to limitations of memory space versus the duration of the acquisition (IOW: when the sampling rate can no longer be the maximum). Eres OTOH can be done in software as this is a post-processing step; doing this in software has the advantage that you can change the setting after the acquisition. Anything trigger related has to be done inside the FPGA though due to realtime requirements. |
| Navigation |
| Message Index |
| Next page |
| Previous page |