Author Topic: A High-Performance Open Source Oscilloscope: development log & future ideas (Read 70518 times)

rhb · « **Reply #275 on:** December 30, 2020, 12:59:19 am »

Quote from: 2N3055 on December 18, 2020, 09:20:32 am

Digital scope with a screen (distinction from digitizer that samples data inside data acquisition system) need to serve two functions:
- emulate behaviour of a CRT oscilloscope on a screen
- function as a digitizer in a background, so that all data that it captured is sampled properly and doesn't contain any nonsense in mathematical way.

First point is well served with decimating to screen with peak detect.

[snip]

You clearly have never compared a good analog scope trace at different settings with a "peak detect" DSO using a *very* short (<10% of sample interval) pulse.

If you had, you would realize that peak detect is a *very* poor imitation of an analog scope. The Fourier transform does not conform to the displayed trace. Of course, you do need to understand what you are looking at in both time and frequency by inspection.

"Peak detect" is a crude bodge to make up for improper downsampling by decimation without appropriate low pass filtering.

If you want to understand why things are this way, I'll be happy to pose the problems. But I had to do them 35 years ago in Linear Systems and have no interest in repeating my school exercises. I learned the lessons.

After a lot of consideration, I have concluded that a proper DSO should offer the user the option of either Bessel-Thomson or Buterworth LPF to suit the use case. In any* case, it *must* suppress aliases by -6 dB per bit at Nyquist. Should you be so foolish as to not do the exercises in order to actually learn how it really works, you're on your own.

Have Fun!
Reg

2N3055 · « **Reply #276 on:** December 30, 2020, 01:09:42 am »

Quote from: rhb on December 30, 2020, 12:59:19 am

Quote from: 2N3055 on December 18, 2020, 09:20:32 am
Digital scope with a screen (distinction from digitizer that samples data inside data acquisition system) need to serve two functions:
- emulate behaviour of a CRT oscilloscope on a screen
- function as a digitizer in a background, so that all data that it captured is sampled properly and doesn't contain any nonsense in mathematical way.

First point is well served with decimating to screen with peak detect.

[snip]

You clearly have never compared a good analog scope trace at different settings with a "peak detect" DSO using a *very* short (<10% of sample interval) pulse.

If you had, you would realize that peak detect is a *very* poor imitation of an analog scope. The Fourier transform does not conform to the displayed trace. Of course, you do need to understand what you are looking at in both time and frequency by inspection.

"Peak detect" is a crude bodge to make up for improper downsampling by decimation without appropriate low pass filtering.

If you want to understand why things are this way, I'll be happy to pose the problems. But I had to do them 35 years ago in Linear Systems and have no interest in repeating my school exercises. I learned the lessons.

After a lot of consideration, I have concluded that a proper DSO should offer the user the option of either Bessel-Thomson or Buterworth LPF to suit the use case. In any* case, it *must* suppress aliases by -6 dB per bit at Nyquist. Should you be so foolish as to not do the exercises in order to actually learn how it really works, you're on your own.

Have Fun!
Reg

Sometimes I wonder did you ever used a scope in your life...
Or learn how to read..
Read again what I wrote. I know that peak detect is mathematically incorrect for further analysis. But it is correct for the screen.

We have been here before and I did tell you to this simple experiment that is easy to reproduce:

Take a 100MHz carrier and AM modulate it with 100 Hz, and put scope on 2 ms/div you'll see this:

Scope is sampling at 100 MS/s. At half Nyquist.
And it is showing same as analog scope would do...

rhb · « **Reply #277 on:** December 30, 2020, 01:45:37 am »

I can easily contrive cases where it doesn't mater. I can also create cases where it does. Your assertion is only correct if you ignore the Fourier transform of the screen image.

I don't wish to be rude, but this is basic DSP 101.

Buy one of Leo bodnar's 100 ps pulse generators, feed it to an analog scope and a DSO with peak detect and we can discuss further. Until then.

Have Fun!
Reg

tautech · « **Reply #278 on:** December 30, 2020, 03:34:58 am »

Reg, maybe you have forgotten the lessen rf-loop gave you here:
https://www.eevblog.com/forum/testgear/scope-wars/msg3121780/#msg3121780

2N3055 · « **Reply #279 on:** December 30, 2020, 08:57:07 am »

Quote from: rhb on December 30, 2020, 01:45:37 am

I can easily contrive cases where it doesn't mater. I can also create cases where it does. Your assertion is only correct if you ignore the Fourier transform of the screen image.

I don't wish to be rude, but this is basic DSP 101.

Buy one of Leo bodnar's 100 ps pulse generators, feed it to an analog scope and a DSO with peak detect and we can discuss further. Until then.

Have Fun!
Reg

You are rude because you don't read but are still being condescending and patronizing. I wrote:

Digital scope with a screen (distinction from digitizer that samples data inside data acquisition system) need to serve two functions:
- emulate behaviour of a CRT oscilloscope on a screen
- function as a digitizer in a background, so that all data that it captured is sampled properly and doesn't contain any nonsense in mathematical way.

First point is well served with decimating to screen with peak detect.

So to repeat, again, because we spoke of this before and you learned nothing the first time:

In order for scope to show, visually, on the screen, what people expect to see from time domain instrument, and make it similar to CRT scope it has to deal with the data in different manner, than what you would do if you sample data for spectral analysis, or for mathematically correct DSP analysis of any sort.

In order to deal with these contradictory requirements, high end mixed domain scopes either have modes in which they reconfigure data engines to work in Scope/RF SA mode, or they use powerful FPGA/ASIC and continuously sample at full speed, and then create 3 data streams through 3 separate datapump/decimation/DSP blocks to have screen/SA/propper measurement and further raw data analysis.

For lesser platforms, priority for scopes was to function as scopes so they have screen/propper data buffer architecture, with FFT for spectral analysis on top of normal scope data as afterthought. One inexpensive scope that has MDO approach is GW Instek that has SA mode, where they reconfigure data engine to work in more realtime SA mode.

So to simplify so you can understand this time, you're not supposed to analyse Peak detect data. Nobody ever said that, on several occasion I said explicitly it is incorrect data for any kind of further analysis (not completely though, you can extract P-P envelope from it for instance). Peak detect data is perfect data for screen though.
To be completely correct, absolutely correct data to plot, like David Hess said before, would be histogram of data to decimate, encoded in intensity by distribution density at value bins.
That would be perfect CRT emulation.
In real life Peak detect does decent representation, because it will still show very fast and rare peaks, instead for them to be hidden by being too dim to see.. If you would use histogram to calculate pixel brightness, you would still need to not make it completely linear, but bump up black point to be able to see rare events. Some compression would need to be there.

You need to separate screen stream and data buffer stream in the very beginning of data processing, and treat them separately, for them to both be correct.

gf · « **Reply #280 on:** December 30, 2020, 03:49:10 pm »

Quote from: tautech on December 30, 2020, 03:34:58 am

https://www.eevblog.com/forum/testgear/scope-wars/msg3121780/#msg3121780

Does it actually mean that this acquisition mode deliberately changes the sampling clock phase for each acquired trace, so that displaying the stitched traces with (persistent) points leads to an oversampled, ETS-like appearance?

EDIT:

Or am I fooled by an illusion? Does the ADC just happen to act as direct down converter here, due to the particular signal frequency to sampling rate ratio chosen for the example? [ which would mean that the same would not work for arbitrary signal frequencies ]

LeCroy's RIS is documented, but where can I actually find a documentation of Siglent's SARI mode?

tautech · « **Reply #281 on:** December 30, 2020, 08:13:44 pm »

Quote from: gf on December 30, 2020, 03:49:10 pm

LeCroy's RIS is documented, but where can I actually find a documentation of Siglent's SARI mode?

Use the LeCroy documentation as much of Siglent's design implementation mirrors theirs as they have both worked together on several products.
That many Siglent products are also rebranded as LeCroy might indicate how the two think alike.

2N3055 · « **Reply #282 on:** December 30, 2020, 08:40:53 pm »

Quote from: gf on December 30, 2020, 03:49:10 pm

Quote from: tautech on December 30, 2020, 03:34:58 am
https://www.eevblog.com/forum/testgear/scope-wars/msg3121780/#msg3121780

Does it actually mean that this acquisition mode deliberately changes the sampling clock phase for each acquired trace, so that displaying the stitched traces with (persistent) points leads to an oversampled, ETS-like appearance?

EDIT:

Or am I fooled by an illusion? Does the ADC just happen to act as direct down converter here, due to the particular signal frequency to sampling rate ratio chosen for the example? [ which would mean that the same would not work for arbitrary signal frequencies ]

LeCroy's RIS is documented, but where can I actually find a documentation of Siglent's SARI mode?

https://siglent.fi/oskilloscope-info-interpolation.html

google translate from Finish makes decent job...

tom66 · « **Reply #283 on:** December 30, 2020, 10:57:57 pm »

Wow, thread blew up over Christmas. I will publish the results from the survey in the new year when I have some more spare time to dust out my statistics textbook. Some quite surprising results.

And, regarding the decimate/peak-detect/filter discussions. This is why I like optionality ... there's nothing stopping the instrument supporting the mathematically correct DSP transform, and the engineering transform that is wrong, but produces results more in line with the users' expectations. As I see it the biggest problem with peak detect is it doubles memory consumption, so it should not necessarily be a default setting for longer timebases as it will reduce the available sample rate.

tom66 · « **Reply #284 on:** January 01, 2021, 08:17:59 pm »

Survey Results

Thanks to everyone for filling out the survey. 43 valid responses were received over the course of a month which is not so bad for a small engineering forum, and enough to sample some reasonable data with a fair bit of confidence. For those interested, the results are publicly published here, besides the comments field for user privacy: https://docs.google.com/spreadsheets/d/1yqCfIa8lzXFmDxayfT2XsBuWI-NDyL6ssCdF4YwLubI/edit?usp=sharing

Some surprising conclusions from the survey. It seems that users on here are roughly evenly divided between software engineering and hardware engineering, and about 20% report other fields. I had actually expected to see more FPGA/RTL engineers (none reported that as their profession) but perhaps this is a limitation of the single selection, as at least in the case of my career and present employment I am both a hardware engineer and a FPGA systems engineer.

It seems the majority of users on here are experienced in their field with more than 12 years' of professional experience. I had originally intended the distribution of options to be somewhat logarithmic but perhaps I should have included a 20+ years option to further divide this category. Nonetheless I think it is fair to say that the majority of people interested in this project are professional and experienced engineers. That means the product must be professional too, of course! Over 85% of individuals said that they would consider the purchase of a FOSHW instrument, which is good news.

On pricing, the data was a pleasant surprise as I was worried about the need to cost-optimise even further. Some were willing to pay over $1000 USD for such an instrument. Weighting the prices by assuming that any option is taken as half-way in the range (with the upper and lower bound options set at their respective minimums and maximums) this gives a price target of USD $713 for the instrument. That should be achievable, depending upon the final specification and what modules, if any, need to be included in that configuration (I expect the price to include a 4 channel AFE.)

It seems most people would be accepting of a touchscreen based user interface, though there were some who said that it would not be acceptable. I would be in favour of a limited degree of physical controls, but a complex control assembly would be expensive to tool up and could increase the bill of materials considerably; it also limits the flexibility (e.g. channel knobs that don't make sense when you have an LA module installed, for instance.) Some consideration of overall device form factor needs to be made here.

The "please rank the importance" data were also very useful and helps steer this project somewhat:-

Modularity ranked highly (average score 4.0/5), with the majority split between 4 and 5 points suggesting this to be essential to most users. I intend for the instrument to be modular but it was good to have confirmation of this.

Portability was mostly unimportant to those surveyed (average score 2.1/5), with the majority suggesting 'not at all' useful.

Configurable digital filters scored a rather midfield 3.3/5, with most people selecting option 3, 'somewhat important'. There however were a substantial number both regarding this as essential, and another group similarly as not essential. More research and discussion is needed on this point I suspect, to determine the performance level that is required by those who desire such a function.

There was more majority support for an MSO function, but the average score was similar to digital filters at 3.6/5. However, it seems this score is weighted more by those that regard it to be essential. One user commentated that state analysis would be essential for such a function, and I agree. This means the memory capture and trigger path needs to support modes synchronous to analog channels and asynchronous too. That is a lot more difficult than supporting only one route, but it should be at least practical to support state-only analysis (with analog channels off), it gets a bit more difficult to try to synchronise this with the analog data.

Wi-Fi connectivity was not seen as that important, with an average score of 2.0/5. That is fine - it can be supplied by an external USB stick if required. It does not appear there is sufficient interest at this point to justify internal integration, with all the complications from an RF and EMC design perspective. The correlation between this and portability, however, was not that great, at 0.23, indicating a loose agreement between the answers, though the size of the survey may make drawing a better result more difficult here.

Stronger interest was heard for the DDS function, with an average score of about 3.0/5, although much like the configurable digital filters option this seems like an option that has mostly 'average' support, with few people regarding it as essential. That was somewhat surprising to me and pushes the DDS function more towards an external module card, if it is implemented. It should be relatively trivial to implement it using the FPGA's spare resources, requiring only a few SERDES blocks, but will require a board with external filters, offset, gain control and output amplifiers. A suggestion of an isolated DDS was made by one respondent. This would likely have to be a separate module (I personally don't think it is worth including as a 'standard' option due to the cost), but with high speed digital isolators available from Analog Devices and others, plus a DC-DC module to bridge the isolation gap, it's eminently practical to do so. But this would likely add a fair bit to the cost of any such module.

The 10MHz reference function was well-supported, with the majority of interest around 4/5, for an average score of 3.5/5. This seems like a no-brainer as it is pretty inexpensive to add and comes at relatively little cost. The reference signal could be routed via the FPGA fabric as a combinational logic block, although this might add jitter, so external multiplexers may be preferred. In any case, it's reasonably trivial to feed in a 10MHz reference to the PLL, or to export the PLL's reference signal, with some multiplexer ICs or the FPGA fabric.

And the 50 ohm input termination was also well-supported, with the majority of responses also around 4/5. Very few people regarded it as not important at all. The average score was 3.8/5. I intend to investigate adding a fast relay turn-off circuit to any 50 ohm input, using a simple latch, to allow the terminator to be protected if the input is grossly overloaded. Of course, it will not save your ass if you connect it to 240V mains, but it might stop you damaging the terminator if it is hooked to 24V instead of 5Vpk max. (For simplicity, it would turn off all terminators if input voltage is exceeded, if the 50 ohm mode is enabled for that channel.)

The next steps

The existing hardware architecture was a great proof of concept but I believe to continue this project it will be essential to develop a second-generation PCB. I see two possible routes forward for this project. The lack of portability as a serious requirement unlocks options that would be more power-hungry than a battery powered solution might otherwise support (for 8+ hour runtimes.)

Option A. Use PCI Express with a Pi Compute Module 4, replacing the reverse engineered CSI bus, connected to a similar Zynq 7000 (PCI-Express variant so likely Zynq 7015.) The UltraScale is a very nice platform but the biggest disadvantage here is that it restricts you to a Xilinx Linux distribution (and Xilinx are not too good with keeping this up to date, nor do I want to go down the rabbit hole of having to build a custom kernel.) For all of its faults, the Pi has a open and supported kernel, with limited proprietary software with at least good regular support. The CM4 is also a 'beast' in terms of processing offering in a modest configuration 4GB of application memory, gigabit Ethernet and a USB2.0 controller. (There is some pain in supporting USB3.0 and PCI-e at the same time; a bridge IC may be necessary, and I'm not sure of the kernel or driver complexities there. However, the CM4 does have internal gigabit Ethernet.) However, the limitation of using the 32-bit Zynq is addressable memory will be limited to 1GB, so sample memory would be limited to max 900Mpts or so, fixed to the board in a dual DDR3 arrangement. Sample rates would also top out around 2-2.5GSa/s on 1 channel or 1.2GSa/s on a pair of channels. Yes, a MIG configuration is also an option to add say an additional memory bus, but for prior stated reasons I am not a supporter of that route forward. However I am relatively confident a price of US$750 can be achieved here with a touchscreen UI and a few controls. My past experience with Zynq suggests that a passive heatsink will be sufficient as a cooling solution; the Pi 4 may end up being the ultimate thermal bottleneck and require some careful thermal engineering to allow the Pi 4 to run at near-100% CPU load for sustained periods of time.

Option B. Zynq UltraScale+ architecture with the ZU2EG or ZU3EG variant in use. No Compute Module. Linux application runs on the Zynq UltraScale with direct memory access to waveform data, and Mali-400 GPU helps with any 2D GPU tasks that can be offloaded, though waveform rendering still likely to be software or FPGA based. Large memory space available: should be able to support at least 8GB and probably 16GB in a user-accessible DDR4 SODIMM, granting long acquisition buffers. The biggest disadvantage here is that the cost is considerable - the UltraScale is not a cheap device. The Linux headaches as above are also noteworthy and may slow development if the available supported kernel is stuck in the dark ages. I estimate that this solution would push the product more past the US$900 price point which may limit the market for it. However, it is likely that with the fast UltraScale fabric, the higher memory bandwidth (more internal AXI fabric, faster FPGA fabric, etc.) that the system could exceed 5GSa/s in a single channel configuration with the right engineering efforts, although probably not from day one. Power consumption of the UltraScale may also require a heatsink and, while I would intend to avoid it as I have a genuine dislike of the things in test equipment, a fan may also become necessary.

Other options have been discussed on here, including going to a plain FPGA to CM4 interface over PCI Express. However, there are really serious advantages to having the Zynq ARM (in either configuration) closely coupled to the FPGA fabric, that would be difficult and resource-expensive to implement on a soft-core CPU. Something like a Kintex with a MicroBlaze has been considered, but the disadvantage is the performance from a soft processor is limited without having to dedicate a large amount of fabric to it. The Zynq SoCs aren't much more than their comparable FPGA-only brethren, but deal with a lot of the headaches for you already. Improving how 'accessible' this platform is to hardware and software hackers is critical to me.

I do appreciate the continued discussion here so please let me know your thoughts, positive or negative.

Tom

Edited to fix omission with Wi-Fi results.

nctnico · « **Reply #285 on:** January 01, 2021, 09:38:39 pm »

I still think the best solution is to use PCI express and do all processing on a compute module (not necessarily CM4 but let's start there). Lecroy oscilloscopes are built this way as well; all waveform processing is done on the host platform. Doing processing inside the FPGA sounds nice for short term goals but long term you'll be shooting yourself in the foot because you can't extend the resources at all. At this point you already identify using the Zync as a bottleneck! Going the PCI express route also allows to use multiple FPGAs and extend the system to multiple channels (say 8 ) by just copying and pasting the same circuit. Heck, these could even be modules which plug into a special backplane with slots to distribute trigger, time synchronisation and PCI express signals. A standard PCI express switch chip aggregates all PCI express lanes from the acquisition units towards the compute module. Either way the next step should include moving all processing to a single point to achieve maximum integration. Having multiple processors at work and needing to communicate between them makes the system more difficult to develop and maintain in the end although it may seem like an easy way out right now (been there, done that). Which also circles back to defining the architecture first and then start to develop.

BTW the external 10MHz reference will prove more complicated then you think. This will need to feed the clock synthesizer on the board directly as any FPGA based 'PLL' will add way too much jitter.

tom66 · « **Reply #286 on:** January 01, 2021, 10:30:11 pm »

Quote from: nctnico on January 01, 2021, 09:38:39 pm

I still think the best solution is to use PCI express and do all processing on a compute module (not necessarily CM4 but let's start there). Lecroy oscilloscopes are built this way as well; all waveform processing is done on the host platform. Doing processing inside the FPGA sounds nice for short term goals but long term you'll be shooting yourself in the foot because you can't extend the resources at all. At this point you already identify using the Zync as a bottleneck! Going the PCI express route also allows to use multiple FPGAs and extend the system to multiple channels (say 8 ) by just copying and pasting the same circuit. Heck, these could even be modules which plug into a special backplane with slots to distribute trigger, time synchronisation and PCI express signals. A standard PCI express switch chip aggregates all PCI express lanes from the acquisition units towards the compute module. Either way the next step should include moving all processing to a single point to achieve maximum integration. Having multiple processors at work and needing to communicate between them makes the system more difficult to develop and maintain (been there, done that).

While I do agree that the CM4 (or whatever module is used) should do much of the processing, I disagree that it should do all of it. Certain aspects are beneficial for FPGA based logic, for instance digital filters using multiplier blocks would be some 10-50x faster if using the FPGA fabric, so it is a "no brainer" to do that. I think the same applies for waveform rendering.

If I go the PCI-e route, then the Pi's software will be able to access any waveform. Address translation for trigger correction may even be performed on the Zynq side, if I am smart enough to be able to make that work - but if not doing it on the Pi would not be terribly difficult or resource intensive. There is no 'foot shooting' here because the CM4 would have access to whatever memory is needed so it can do as little or as much processing as it wants. It would just set up a task queue for what waveform buffers need to be rendered and pull the image out over PCI-e when it gets an interrupt, for example. There's also the bidirectional aspect so the Pi could load arbitrary code into the Zynq (or even a new bitstream!) via the fast PCI-e link, or send waveforms for DDS functions or for other DSP processing. IIRC the PCI-e on the Pi CM4 is 5Gb/s x1, and the Zynq supports up to x4, Ultrascale goes up to x16.

Quote from: nctnico on January 01, 2021, 09:38:39 pm

BTW the external 10MHz reference will prove more complicated then you think. This will need to feed the clock synthesizer on the board directly as any FPGA based 'PLL' will add way too much jitter.

Only if we used the FPGA PLLs to do anything with the clock signal - I was merely suggesting routing the logic signal through combinatorial blocks and IO which at 10MHz should be OK. The biggest risk is it will introduce noise which will worsen the jitter and phase noise, but this might not be that significant. However, external switches to route the signals in the analog domain (or low jitter digital domain) are another option, they just add more logic and cost.

nctnico · « **Reply #287 on:** January 01, 2021, 11:19:09 pm »

Quote from: tom66 on January 01, 2021, 10:30:11 pm

Quote from: nctnico on January 01, 2021, 09:38:39 pm
I still think the best solution is to use PCI express and do all processing on a compute module (not necessarily CM4 but let's start there). Lecroy oscilloscopes are built this way as well; all waveform processing is done on the host platform. Doing processing inside the FPGA sounds nice for short term goals but long term you'll be shooting yourself in the foot because you can't extend the resources at all. At this point you already identify using the Zync as a bottleneck! Going the PCI express route also allows to use multiple FPGAs and extend the system to multiple channels (say 8 ) by just copying and pasting the same circuit. Heck, these could even be modules which plug into a special backplane with slots to distribute trigger, time synchronisation and PCI express signals. A standard PCI express switch chip aggregates all PCI express lanes from the acquisition units towards the compute module. Either way the next step should include moving all processing to a single point to achieve maximum integration. Having multiple processors at work and needing to communicate between them makes the system more difficult to develop and maintain (been there, done that).

While I do agree that the CM4 (or whatever module is used) should do much of the processing, I disagree that it should do all of it. Certain aspects are beneficial for FPGA based logic, for instance digital filters using multiplier blocks would be some 10-50x faster if using the FPGA fabric, so it is a "no brainer" to do that. I think the same applies for waveform rendering.

There is no need to do filtering and/or rendering inside the FPGA.

First of all you don't need to filter and render the entire record but only enough to fill the screen. Being able to adjust the filter parameters after an acquisition is a big plus and I don't see that being possible if the FPGA filters the data from the ADC.

Secondly, I understand when you say that the processor also has access to the data but that implies that at some point some operations will happen in the FPGA and some in software. But this means you can't tie them together in an easy way. Say the processor wants the filtered data + the original to do protocol decoding. You'll need to tell the FPGA to deliver the filtered data AND the original data will need to be fetched from the memory. And what if someone wants to extend the filters but the FPGA doesn't support that? In software it is easy to implement a 9th order filter; an FPGA implementation is much more rigid so the person likely ends up giving up or re-implementing filtering in software anyway leaving the FPGA implementation abandoned. Or there is a need to do something with the data in software before filtering which requires to feed the data into the FPGA for filtering and then retrieving it. You have to choose where the processing takes place because whether it is rendering or processing it has to be possible to insert/delete blocks (which perform an operation) into the processing chain in an easy way. It is either FPGA or software. There is no AND because otherwise there will be two different places where a 'product' is being made. Like having half of a car assembled in the UK and the other half in France. It doesn't make sense from an architectural point of view.

Thirdly a reasonable GPU offers a boatload of processing power; probably even more than the FPGA can do and with a lot less effort to get it going. Remember that none of the respondents of your survey listed FPGA development as their profession; this means that the FPGA's role needs to be as minimal as possible to allow as many people as possible to participate.

IMHO the FPGA should only do these functions:
- format & buffer (FIFO) the data from the ADC so it can be stored in the acquisition memory
- run the trigger engine

The trigger engine is already complicated enough if it needs to support protocol triggering (and I like the idea of logic analyser like state machine triggering).

I'm not ruling out that the FPGA can play a role in data processing but that will be a carefully considered optimisation which will fit with the architecture of the rest of the system. Implementing data processing inside the FPGA now is optimising before knowing the actual bottlenecks.

PS: I didn't fill in the survey

gf · « **Reply #288 on:** January 02, 2021, 12:21:21 am »

Where are down-sampling/decimation/ERES/peak-detect supposed to be done when data are supposed to be stored at a lower sampling rate (say only 100kSa/s) than the maximum ADC rate (in order that a longer time interval fits into the memory)?

[ Assume, I want to acquire a single buffer with 100M samples at 100kSa/s (i.e. 1000 seconds). In this case it is no longer feasible to dump 1000s of data @1GSa/s to memory first, and decimate in the post-processing. ]

Any trigger filters (in front of the comparators) also need to be applied in the FPGA.

nctnico · « **Reply #289 on:** January 02, 2021, 12:43:02 am »

Quote from: gf on January 02, 2021, 12:21:21 am

Where are down-sampling/decimation/ERES/peak-detect supposed to be done when data are supposed to be stored at a lower sampling rate (say only 100kSa/s) than the maximum ADC rate (in order that a longer time interval fits into the memory)?

[ Assume, I want to acquire a single buffer with 100M samples at 100kSa/s (i.e. 1000 seconds). In this case it is no longer feasible to dump 1000s of data @1GSa/s to memory first, and decimate in the post-processing. ]

Peak-detect and decimation will need to be done inside the FPGA. But these are due to limitations of memory space versus the duration of the acquisition (IOW: when the sampling rate can no longer be the maximum). Eres OTOH can be done in software as this is a post-processing step; doing this in software has the advantage that you can change the setting after the acquisition.

Anything trigger related has to be done inside the FPGA though due to realtime requirements.

gf · « **Reply #290 on:** January 02, 2021, 01:25:24 am »

Sure, the reason for storing data at a lower sampling rate is of course the limited amount of memory.

So far I have considered ERES/HIRES rather a down-sampling acquisition mode, storing data at lower sampling rate, but with higher precision.

If memory suffices to store the data at full-speed, any kind filter can be applied in the post-processing, of course.

nctnico · « **Reply #291 on:** January 02, 2021, 01:31:02 am »

Quote from: gf on January 02, 2021, 01:25:24 am

Sure, the reason for storing data at a lower sampling rate is of course the limited amount of memory.

So far I have considered ERES/HIRES rather a down-sampling acquisition mode, storing data at lower sampling rate, but with higher precision.

The actual implementation of Eres/hires is very brand specific. Some DSOs will store higher precision values in the acquisition memory (Tektronix for example) where others implement it as a math trace in software (Lecroy for example). Implementing eres/hires in software is the most simple & flexible approach.

tom66 · « **Reply #292 on:** January 02, 2021, 10:24:52 am »

Implementing ERES in software requires a lot more memory than the Zynq can support (1GB memory space) and would require the software process a large number of samples for every waveform rendered. You can implement an ERES filter later in software if you wanted to, but the FPGA should still support ERES recording into say 16-bit accumulators which saves memory and improves available sample rate.

However I will concede that nctnico, you have convinced me to investigate the Nvidia Jetson Nano module some more, as it has an x4 PCI-e interface, which would make it an ideal candidate for high speed interfacing with the FPGA (also supporting x4 PCI-e in 7015 configuration.) Depending on the level of processing that board can do on the Jetson processor, it may make sense to have a smaller FPGA involved in limited processing, and have more of the processing done in software, where there is more user flexibility.

It really depends on how much DSP you want to do and there is a case for doing some DSP on the FPGA but a GPU is also an option. I'm still not thoroughly convinced the GPU would be good for rendering the waveform itself - while initially it seems like a good target for a GPU, it involves random pixel hits, which a GPU is not generally designed to support. Most GPUs, including Nvidia's Maxwell, are tile based (technically Maxwell is tile-cache based, but there are minor differences), with the expectation that pixel hits will more likely occur within their tile range. That said, it's certainly worth investigating.

I've ordered a Jetson Nano and will see what I can do with it with internally generated waveform data. Porting ArmWave across should be an interesting January project.

nctnico · « **Reply #293 on:** January 02, 2021, 04:38:07 pm »

About the Jetson Nano... the thermal solution is horrible and also getting the software going to support the PCI express lanes is not very straightforward. This needs changing the DTB files and NVidia turned these into a huge convoluted mess without any documentation. I can lend a hand with dealing with the Jetson module (I have integrated the Jetson TX2 module in a product) but perhaps the RPi CM4 is a better choice (as a first step) where it comes to simplicity of integration and community support.

tom66 · « **Reply #294 on:** January 02, 2021, 10:41:53 pm »

I think it depends on the level of demand for DSP, but if you want to make the scope mostly software, you need a really powerful GPU and compute core. Also the x1 PCI-e on the Pi 4 is useful but on the Zynq that could only be used up to 2Gbit/s (after 8b10b) which is barely faster than the 2-lane CSI implementation. Sure it's memory mapped but there would be ways to do that with CSI-2 as well. Jetson Nano is x4, so theoretically, 1GB/s transfer rate. Whole RAM in one second copy. Not bad.

The other thing that attracts me to the Jetson Nano is that it's pin compatible with the Jetson Xavier NX, which is some 3x more compute available, so it's a route forward for serious power users. On the datasheet specifications, it seems plausible that it could do several thousand tap long filters at very high memory depths, if indeed the MAC engines can be chained in a suitable manner. It is also likely to be able to do very long FFTs.

Not too concerned about the thermal solution for development, and for production the module comes without the heatsink so any thermal solution would necessarily be custom. I'd prefer no fan but am aware that 5W+ in a small case with a passive heatsink will be a challenge. Heatsinking to a larger aluminum extrusion was my preferred method when CM3 was in play.

Although it proved a bit harder to get the Jetson from the UK as the distributor went out of stock when I ordered it (or maybe they never had stock) so I'll put it on my next Digi-Key order next week.

I've never played with PCI-e before so it will be an interesting learning experience, but so was reverse engineering CSI-2. Any tips would be appreciated.

nctnico · « **Reply #295 on:** January 02, 2021, 11:02:35 pm »

I've ordered Jetson stuff from Arrow and they forward the order to the local branch. Works quite good. Passive heatsinking is doable. IIRC the Xavier sits around 25W tot 30W. A 20-ish by 10-ish centimeter heatsink with wide spaced fins to allow convection cooling does the job OK. 'My' Jetson TX2 project uses such a passive heatsink with a thermal design target of 60 degrees ambient for a 20W power dissipation (and some thermal headroom to spare).

Where it comes to PCI-express routing it is a matter of getting the differential pairs right with phase shift corrections to account for bends being shorter / longer. How difficult that is to achieve depends on the PCB package you are using. If it has differential phase matching it is not difficult. At the FPGA side it should be a matter of setting up the core and dropping it into the design. From there it should pop-up in the PCI tree of the Linux kernel. From the software side use memmap to map the PCI express memory areas into user space and you can talk to the FPGA. Unless there is a realtime requirement to handle something from software, a driver may not be necessary. Where it gets hairy is to enable / disable caching and to have the FPGA push data into the processor's memory space but when the acquisition memory is attached to the FPGA that may not be necessary.

tom66 · « **Reply #296 on:** January 02, 2021, 11:28:35 pm »

Unfortunately, I veto ordering from Arrow due to a prior total screwup with them that cost me weeks of my time and chasing them for refunds as they totally misunderstood DDP incoterms, actually for this very project. Digi-Key do have stock and they've never done me wrong.

Memory mapping should be fine then. I would assume the process would need to have permission to access PCI devices, though? Does it need to be in a PCI group or run as root/thru sudo?

PS. Not worried about differential routing. The DDR3 memory on the prototype was the hardest part. CSI-2 bus was comparably easy. CAD tools, it's all CircuitMaker/Altium. Am considering whether I should move to KiCad though to keep tools open.

Hydron · « **Reply #297 on:** January 02, 2021, 11:41:14 pm »

Arrow seems to now have dropped the DDP option for UK shipments altogether, probably due to the clusterfuck that is Brexit making it harder to do.
Means that it's uneconomic to buy from them now anyway as they tend to ship multiple packages per order, each of which FedEx will bill you £12 for handling the VAT payment on.
Having similar issues buying from some other suppliers too, shame all the pain can't be reserved for the people who bought into the lies back in 2016 😡.

As for the Jetson, you may be interested in the open source antmicro Jetson carrier board - they have their Altium design files up on GitHub. Might save some time, even if it's just nabbing footprints etc. from it.

nctnico · « **Reply #298 on:** January 03, 2021, 01:22:55 am »

Quote from: tom66 on January 02, 2021, 11:28:35 pm

Memory mapping should be fine then. I would assume the process would need to have permission to access PCI devices, though? Does it need to be in a PCI group or run as root/thru sudo?

Root rights are enough. From the OS point of view you are mapping a piece of physical memory into a user space process. Something to look out for is to tell memmap to mark the memory as uncacheable.

Quote

PS. Not worried about differential routing. The DDR3 memory on the prototype was the hardest part. CSI-2 bus was comparably easy. CAD tools, it's all CircuitMaker/Altium. Am considering whether I should move to KiCad though to keep tools open.

I'd stick to Altium for now. The costs for producing a prototype are so high that it is unlikely many people will be changing the layout and if they do they might want a completely different form factor and start from scratch.

free_electron · « **Reply #299 on:** January 05, 2021, 04:05:17 am »

If you need layout help ... or a second pair of eyes. ping me.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: A High-Performance Open Source Oscilloscope: development log & future ideas (Read 70518 times)

Share me