Author Topic: A High-Performance Open Source Oscilloscope: development log & future ideas (Read 69948 times)

tom66 · « **on:** November 15, 2020, 02:16:28 pm »

Questionnaire for those interested in the project.

I'd appreciate any responses to understand what features are a priority and what I should focus on.
https://docs.google.com/forms/d/e/1FAIpQLSdm2SbFhX6OJlB834qb0O49cqowHnKiu7BEsXmT3peX4otOIw/formResponse

All responses will be anonymised and a summary of the results will be posted here (when sufficient data exists.)

Introduction

You may prefer to watch the video I have made:

Over the past year and a half I have been working on a little hobby project to develop a decent high performance oscilloscope, with the intention for this to be an open source project. By 'decent' I class this as something that could compete with the likes of the lower-end digital phosphor/intensity graded scopes e.g. Rigol DS1000Z, Siglent SDS1104X-E, Keysight DSOX1000, and so on. In other words, 8-bit ADC, 1 gigasamples per second sampling rate on at least 1 channel, 200Mpt of waveform memory and rendering at least capable of rendering 25,000 waveforms/second.

The project began for a number of reasons. The first was because I wanted to learn and understand more about FPGAs; having only been one to blink an LED on an FPGA dev kit before implementing an oscilloscope seemed like a validating challenge. Secondly, I wasn't aware of any high performance open source oscilloscopes, ones that could be used every day by an engineer on their desk. I've since become aware of ScopeFun but the project is a little different from ScopeFun as they do the data processing on a PC whereas I intended to create a self-contained instrument with data capture and display in one device. For the display/user interface I utilise a Raspberry Pi Compute Module 3. This is a decent little device but crucially it has a camera interface port capable of receiving 1080p30 video, which works out to about 2Gbit/s of raw bandwidth. While this isn't enough to buffer raw samples from an oscilloscope, it's sufficient once you have a trigger criteria and if you have an FPGA in the loop to capture the raw data.

At the heart of the oscilloscope is a Xilinx Zynq 7014S system on chip on a custom PCB, connected to 256MB of DDR3 memory clocked at 533MHz. With the 16-bit memory interface this gives us a usable memory bandwidth of ~1.8GB/s. The Zynq is essentially an ARM Cortex-A9 with an Artix-7 FPGA on the same die, with a number of high performance memory interfaces between the two. Crucially, it has a hard on-silicon memory controller, unlike the regular Artix-7, which means you don't use up 20% of logic area implementing that controller. The FPGA acquires data using a HMCAD1511 ADC, which is the same ADC used in the Rigol and Siglent budget offerings. This ADC is inexpensive for its performance grade (~$70) and available from Digi-Key. A variant HMCAD1520 device offers 12-bit and 14-bit capability, with 12-bit at 500MSa/s. The ADC needs a stable 1GHz clock which is provided in this case by an ADF4351 PLL.

Data is captured from the ADC front end and packed into RAM using a custom acquisition engine on the FPGA. The acquisition engine also works with a trigger block which uses the data in the raw ADC stream to decide when to generate a trigger event and therefore when to start recording the post-trigger event. The implemented oscilloscope has both pre- and post-trigger events with a custom size for both from just a few pre-trigger samples to the full buffer of memory. The data is streamed over an AXI-DMA peripheral into blocks defined by software running on the Zynq. The blocks are streamed out of the memory into a custom CSI-2 peripheral also using a DMA block (using a large scatter-gather list created by the ARM.) The CSI-2 data bus interface is reverse-engineered, from documentation publicly available on the internet and by analysing a slowed-down data bus from an official Pi camera, with a modified PLL, captured on my trusty Rigol DS1000Z. I have a working HDL and hardware implementation that reliably runs at >1.6Gbit/s and application software on the Pi then renders the data transmitted over this interface. Most application software is written in Python on the Pi, with a small bit of C to interface with MMAL and to render the waveforms. The Zynq software is raw embedded C, running on baremetal/standalone platform. All Zynq software and HDL was developed with Vivado and Vitis toolkit from Xilinx.

Now, caveats: Only edge triggers (rising/falling/either) are currently supported, and only a 1ch mode is currently implemented for acquisition; it is mostly a data decimation problem for 2ch and 4ch modes but this has not been implemented for the prototype. All rendering is done in software presently on the Pi as there were some difficulties keeping a prototype GPU renderer stable. This rendering task uses 100% of one ARM core on the Pi (there is almost certainly a threading benefit available but that is unimplemented at present due to Python GIL nonsense) but the ideal goal would be to do the rendering on the Pi's GPU or on the FPGA. A fair bit of the ARM on the Zynq is busy just managing system tasks like setting up AXI DMA transactions for every waveform, which could probably be sped up if this was done all on the FPGA.

The analog front end for now is just AC coupled. I have a prototype AFE designed in LTSpice, but I haven't put any proper hardware together yet.

The first custom PCB (the "Minimum Viable Product") was funded by myself and a generous American friend who was interested in the concept. It cost about £1,500 (~$2,000 USD or 1,700 EUR, approx) to develop in total, including two prototypes (one with a 7014S and one with a 7020; the 7020 prototype has never been used). This was helped in part by a manufacturer in Sweden, SVENSK Elektronikproduktion, who provided their services at a great price due to the interest in the project (particular thanks to Fredrik H. for arranging this.) It is a 6 layer board, which presented some difficulty in implementation of the DDR3 memory interface (ideal would be 8-10 layers), but overall results were very positive and the interface functions at 533MHz just fine.

The first revision of the board worked with only minor alterations required. I've nothing but good words to say about SVENSK Elektronikproduktion, who helped bring this prototype to fruition very quickly and even with a last minute change and a minor error on my part that they were able to resolve. The board was mostly assembled by pick and place including the Zynq's BGA package and DDR3 memory, with some parts later hand placed. I had the first prototypes delivered in late November 2019 and had the prototype up and running by early March 2020 and the pandemic meant I had a lot more time at home so development continued at rapid pace from then onwards. The plan was to demonstrate the prototype in person at EMFCamp 2020 but for obvious reasons that event was cancelled.

(Prototype above is the unused 7020 variant.)

Results

I have a working, 1GSa/s oscilloscope that can acquire and display >22,000 wfm/s. There is more work to be done but at this stage the prototype demonstrates the hardware is capable of providing most needs from the acquisition system of a modern digital oscilloscope.

The attached waveform images show:
1. 5MHz sine wave modulated AM with 10kHz sine wave
2. 5MHz sine wave modulated FM with 10kHz sine wave + 4MHz bias
3. 5MHz positive ramp wave
4. Psuedorandom noise
5. Chirp waveform (~1.83MHz)
6. Sinc pulse

The video also shows a live preview of the instrument in action.

Where next?

Now I'm at a turning point with this project. I had to move job and location for personal reasons, so took a two month break from the project while starting at my new employer and moving house. But, I'm back to looking at this project, still in my spare time. And, having reflected a bit ...

A couple of weeks ago the Raspberry Pi CM4 was released. It's not pin compatible with the CM3, which is of course expected as the Pi 4 has PCI-Express interface and an additional HDMI port. It would make sense to migrate this project to the CM4; the faster processor and GPU present an advantage here. (I have already tested the CSI-2 implementation with a Pi 4 and no major compatibility issues were noted.)

There's also a lot of other things I want to experiment with. For instance, I want to move to a dual channel DDR3 memory interface on the Zynq, with 1GB of total memory space available. This would quadruple the sampling memory, and more than double the memory bandwidth (>3.8GB/s usable bandwidth), which is beneficial when it comes to trying to do some level of rendering on the FPGA. It's worth looking at the PCI-e interface on the CM4 for data transfer, but CSI-2 still offers some advantages, namely that it wouldn't be competing with bandwidth from the USB 3.0 or Ethernet peripherals if those are used in a scope product. PCI-e would also require a higher grade of Zynq with a hard PCI-e core implemented, or a slower HDL implementation of PCI-e, which might present other difficulties.

I'm also considering completely ripping up the Pi CM4 concept and going for a powerful SoC+FPGA like a Zynq UltraScale, but that would be a considerably more expensive part to utilise, and would perhaps change the goals of this project from developing an inexpensive open-source oscilloscope, to developing a higher-performance oscilloscope platform for enthusiasts. The cheapest UltraScale processor is around $250 USD but features an on-device dual ARM Cortex-A53 complex (a considerable upgrade over the ARM Cortex-A9 in the Zynq 7014S), Mali-400 GPU and DDR4 memory controllers; this would allow for e.g. an oscilloscope capture engine with gigabytes of sample memory (up to 32GB in the largest parts!), and we'd no longer be restricted into running over a limited bandwidth camera interface which would improve the performance considerably there.

I think there's a great deal of capability here when it comes to supporting modularity. What I'd like to offer is something along the lines of the original Tek mainframes, where you can swap an acquisition module in and out to change the function of the whole device. A small EEPROM would identify the right software package and bitstream to load and you can convert your oscilloscope into e.g. a small VNA, spectrum analyser, a CAN/OBDII module with analog channels for automotive work, etc. on the fly.

The end goal is a handheld, mains and/or battery-powered oscilloscope, with a capacitive 1280x800 touchscreen (optional HDMI output), 4 channels at 100MHz bandwidth and 1GSa/s multiplexed, minimum 500MSa of acquisition memory and at least 30,000 waveforms/second display rate (with a goal of 100kwaves/sec rendered and 200kwaves/sec captured for segmented memory modes.) I also intend to offer a two channel arbitrary signal generator output on the product, utilising the same FPGA as for acquisition. The product is intended to be open-source in its entirety, including the FPGA design and schematics, firmware on the processor and application software on the main processor. I'll publish details on these in short order, provided there's sufficient interest.

Full disclosure - I have some commercial interest in the project. It started as just a hobby project, but I've done everything through my personal contracting company, and have been in discussions with a few individuals and companies regarding possible commercialisation. No decisions have been made yet, and I intend for the project to be FOSHW regardless of the commercial aspects.

The questions for everyone here is:
- Does a project like this interest you? If so, why? If not, why not?

- What would you like to see from a Mk2 development - if anything: a more expensive oscilloscope to compete with e.g. the 2000-series of many manufacturers that aims more towards the professional engineer, or a cheaper open-source oscilloscope that would perhaps sell more to students, junior engineers, etc.? (We are talking about $500USD difference in pricing. An UltraScale part makes this a >$800USD product - which almost certainly changes the marketability.)

- Would you consider contributing in the development of an oscilloscope? It is a big project for just one guy to complete. There is DSP, trigger engines, an AFE, modules, casing design and so many more areas to be completed. Hardware design is just a small part of the product. Bugs also need to be found and squashed, and there is documentation to be written. I'm envisioning the capability to add modules to the software and the hardware interfaces will be documented so 3rd party modules could be developed and used.

- I'm terrible at naming products. "BluePulse" is very unlikely to be a longer term name. I'll welcome any suggestions.

YetAnotherTechie · « **Reply #1 on:** November 15, 2020, 03:55:13 pm »

I vote for this to be the most interesting post of the year, Great work!!

artag · « **Reply #2 on:** November 16, 2020, 02:48:12 am »

I like the idea, mostly because I REALLY like the idea of an open-source scope that's got acceptable performance. Something I could add a feature to when I want it.

I think you've made fabulous progress, and I think you very much need to watch out for the upcoming problems :

It's very easy to get lost in a maze of processor directions - stretch too far and your completion data disappears over the horizon, set your targets too low and you end up with something that's obsolete before its finished.

The same goes for expansion and software plans - there's a temptation to do everything, resulting in plans that never get finalised, or an infrastructure that's too big for the job.

I don't say this negatively, to put you off - I put these points forward as problems that need a solution.

I'm interested in helping if I can.

radiolistener · « **Reply #3 on:** November 16, 2020, 03:46:03 am »

We need help of Chinese guys to produce and sell cheap hardware for such project

It will be also nice to see Altera Cyclone version.

james_s · « **Reply #4 on:** November 16, 2020, 04:01:41 am »

It's an incredibly impressive project, with that kind of output from just one person on their personal time it would not surprise me if you got a few job offers from T&M companies. It's very interesting from the standpoint of seeing in detail how a modern DSO works although I think you will be really hard pressed to compete with companies like Rigol and Siglent. I personally would be very interested if the alternative was spending $10k on a Tektronix, Keysight or other A-list brand but the better known Chinese companies deliver an incredible amount of bang for the buck. Building something this complex in small quantities is expensive, and it's probably too complex for all but the most hardcore DIY types to assemble themselves. On top of that, the enclosure is a very difficult part, at least in my own experience. Making a nice enclosure and front panel approaching the quality of even a low end commercial product is very difficult. Not trying to rain on your parade though, looks very cool and I'll be watching with interest to see how this pans out.

tom66 · « **Reply #5 on:** November 16, 2020, 08:42:12 am »

Thanks for the comments.

The hardware isn't all that expensive - the current BOM works out as just under US$200 in 500 off quantity. That means it would be feasible to sell this at US$500-600, which although a little more expensive than the cheapest Rigol/Siglent offerings, may be more attractive with the open source aspect. Adding the UltraScale and removing the Pi adjusts the BOM by +US$150, which starts pushing the device into the US$800-$1000 range. Perhaps it would be worth discussing with Xilinx - I know they give discounted FPGAs to dev kit manufacturers - if they are interested in this project they may consider a discounted price. The Zynq is the most expensive single part on the board. But, so far, all pricing is based on Digi-Key strike prices with no discounts assumed.

The idea would be to sell an instrument that has only a touchscreen and 4 probe inputs. The mechanical design of a keypad, knobs, buttons etc and injection moulded case would be significant, and the tooling is not cheap, so an extruded aluminum case would be used. Of course a touchscreen interface wouldn't be attractive to everyone, so a later development might include an external keypad/knob assembly, or you could use a small keyboard. Optionally, the unit could contain a Li-Ion battery and charger, which would allow it to be used away from power for up to 5-6 hours. (The present power consumption is a little too high for practical battery use, but the Zynq and AFE components are running continuously with no power saving considerations right now.)

There isn't much chance someone could hand-assemble a prototype like this. The BGA and DDR memory make it all but impossible for the most enthusiastic members on this forum. There was a reason that, despite having (in my own words) reasonably decent hand-soldering skills, I went with a manufacturer to build the board. I did not want gremlins from having a BGA ball go open circuit randomly, for instance. I was very careful in the stencil specification and design to ensure the BGA was not overpasted. The 7014S board has been perfectly reliable, all considered, even while the Zynq was running at 75C+ pre-heatsink.

While I've not had any offers from T&M companies (although - I've not asked or offered it) I did get my present job as an FPGA/senior engineer with this project as part of the interview process (as Dave says - bring prototypes - they love them!) There are a couple in the Cambridge area but I'm not really interested in selling out to anyone, I wanted to develop this project because there is no great open source scope out there yet and it was a great way to get used to high speed FPGAs and memory interfaces. I've never laid out a DDR memory interface before so it felt incredibly validating that it worked first time.

Regarding Altera parts there would not be much point in using them - the cheapest comparable Altera SoC is double the price of the Zynq and has a slower, older ARM architecture. The Zynq is a really nice processor!

Fungus · « **Reply #6 on:** November 16, 2020, 10:54:50 am »

Quote from: tom66 on November 16, 2020, 08:42:12 am

The idea would be to sell an instrument that has only a touchscreen and 4 probe inputs. The mechanical design of a keypad, knobs, buttons etc and injection moulded case would be significant, and the tooling is not cheap, so an extruded aluminum case would be used. Of course a touchscreen interface wouldn't be attractive to everyone, so a later development might include an external keypad/knob assembly, or you could use a small keyboard.

Have a look at Micsigs. Their UI is really good, much faster/easier than traditional "twisty knob" DSOs.

Note that they now make a model with knobs at the side, I'd bet that's because a lot of people were put off by the idea of a touchsceen-only device.

(Although having owned one for a couple of weeks I can say that any fears are unfounded. It works perfectly)

Quote from: tom66 on November 16, 2020, 08:42:12 am

Optionally, the unit could contain a Li-Ion battery and charger, which would allow it to be used away from power for up to 5-6 hours.

Micsig again...

tom66 · « **Reply #7 on:** November 16, 2020, 11:44:05 am »

I'm aware of the Micsig device, I do quite like it. So this is comparable to a Micsig device but with open source hardware and firmware, plus modular capability - the ability to remove the AFE and replace it with a different module for a different task for example. Plus considerably better system and acquisition performance.

I'm a fairly avid user of touchscreen devices in general and while I think there is a case for knobs and dials on a scope, it can be replicated with competent UI design and a capacitive multitouch screen. The problem with adding knobs and dials onto a portable device is that once you drop it, you risk damage to the encoders and plastics. A fully touchscreen device with BNCs being the only exposed elements would be more rugged. Of course, you shouldn't drop any test equipment, but once it is in a portable form factor, it WILL get dropped, by someone.

artag · « **Reply #8 on:** November 16, 2020, 12:24:01 pm »

I've always tended to prefer real knobs and dials, especially in comparing pc-instruments against traditional ones. But we're all getting more used to touchscreens : what they usually lack is a really good, natural usage paradigm. I haven't tried the Micsig devices but have noticed people commenting positively on them.

The WIMP interface is very deeply embedded in us now and tablets don't quite meet it. Some gestures (swipe, pinch) have become familiar but not enough to replace a whole panel. I think we'll slowly get more used to it, and learn how to make that more natural.

I like the modularity idea, but it's hard to know where to place an interface. The obvious modules are display, acquisition memory and AFE. Linking the memory and display tightly gives fast response to control changes. Linking the memory and AFE gives faster acquisition. There's also some value in using an existing interface for one of those. Maybe USB3 is fast enough, though I think using the camera interface is really cunning. Another processor option - which also has a camera interface and a GPU - is the NVidia Jetson.

My feeling is that AFE should be tightly coupled to memory, so that as bandwidths rise they can both improve together. As long as the memory to display interface is fast enough for human use, it should be 'fast enough'. The limitation of that argument is when a vast amount of data is acquired and needs to be processed before display. Process in the instrument and you can't take advantage of the latest computing options for display processing. Process in the display/PC and you have to transfer through the bottleneck.

tv84 · « **Reply #9 on:** November 16, 2020, 12:34:25 pm »

Quote from: tom66 on November 16, 2020, 11:44:05 am

with open source hardware and firmware, plus modular capability

Love your modular capability and the implementation. You are one of the 1st to do such a one-man real implementation.

Usually many talk about this but stop short of beginning such a daunting task: they end up not deciding on the processors, the modularity frontiers, they only do SW, others only do HW, etc, etc...

Many other choices could be made but you definitely deserve a congratulation!

Whatever you decide to do, just keep it open source and you will always be a winner!

RESPECT.

nctnico · « **Reply #10 on:** November 16, 2020, 12:48:10 pm »

I like that post processing is done inside the GPU. Having a PCI express interface on the newer RPis would be a benefit. It is also an option to use a SoC chip directly on the board and use a lower cost FPGA (Spartan 6 LXT45 for example) that reads data from the ADC, does some rudimentary buffering and streams it into the PCIexpress bus.

fcb · « **Reply #11 on:** November 16, 2020, 01:12:59 pm »

Great work so far. Although the cost/performance benefit you've outlined is not sufficient to make it compelling commercial project, perhaps it could find a niche?

I'd probably have turned the project on it's head -> what's the best 'scope I can build with Pi Compute module for £XXX. Also, I wouldn't be afraid of a touchscreen/WIMP interface, if implemented well it can be pretty good - although still haven't seen one YET that beats the usuability of an old HP/Tek.

tom66 · « **Reply #12 on:** November 16, 2020, 01:18:52 pm »

My concept for modularity is to keep it very simple. The AFE board will essentially be the HMCAD15xx ADC plus the necessary analog front end hardware and the 1GHz clock.

Then the ADC interfaces with n LVDS pairs going into the Zynq. If I put the 484 ball Zynq on the next board, then I have the capacity for a large number of LVDS pairs.

The modules could be double-wide, i.e. a 4 channel AFE, or single-wide, i.e. a 2 channel AFE and you could then use some arbitrary other module in the second slot. The bitstream and software would be written to be as flexible as possible, although it is possible that not all modular configurations will be allowable. (For instance it might not be possible to have two output modules at once; the limits would need to be defined.)

For instance, you could have a spectrum analyser front end that contains the RF mixers, filters and ADC, and the software on the Zynq just drives the LO/PLL over SPI to sweep, and performs an FFT on the resulting data. The module is different - but gathering the data over a high speed digital link is a common factor.

The modules would also be able to share clocks or run on independent clock sources. The main board could provide a 10MHz reference (which could also be externally provided or outputted) and the PLLs on the respective boards would then generate the necessary sampling clock.

The bandwidth of this interface is less critical than it sounds, for 8Gbit/s ADC (1GSa/s 8-bit) then just 10 LVDS pairs are needed. A modern FPGA has 20+ on a single bank and on the Xilinx 7 series parts, each has an independent ISEREDESE2/OSERDESE2 which means you can deserialise and serialise as needed on the fly on each pin. There are routing and timing considerations but I've not had an issue with the current block running at 125MHz, I think I might run into issues trying to get it above 200MHz with a standard -3 grade part.

My unfinished modularity proposal is here:
https://docs.google.com/spreadsheets/d/1hpS83vqnude4Z6Bsa2l4NRGaMY8nclvE8eZ_bKncBDo/edit?usp=sharing

So the idea is that most of the modules are dumb but we have a SPI interface if needed for smarter module interfacing, which allows e.g. an MCU on the module to control attenuation settings.

The MCU could communicate, via a defined standard, what its capabilities are. If the instrument doesn't have the software it needs, then it can pick that up over the internet via Wi-Fi or ethernet or from a USB stick.

One other route I have is to use a 4-lane CSI module as the Pi does support that on the CM3/CM4. This doubles available transfer bandwidth. I do need to give PCI-e a good thought though because it allows bidirectional transfer - the current solution is purely unidirectional.

IMO there is little benefit in using a separate FPGA + SoC because you lose that close coupling that the Zynq has. The ARM on the Zynq is directly writing registers on the FPGA side to influence acquisition, DMA behaviour etc. That would have to fit over a SPI or small digital link, which would constrain the system considerably. In fact, currently the Pi controls the Zynq over SPI, and that is slow enough to cause issues, so I will be moving away from that in a future version.

jxjbsd · « **Reply #13 on:** November 16, 2020, 02:25:31 pm »

Very good work. I very much agree to keep it simple, and now only the main functions are implemented. It would be great if most of the functions of TEK465 are implemented. Others such as: advanced trigger, FFT, can be implemented later. Only one core board is made, and various control knobs or touch screens are implemented through external boards, which can increase the number of core boards. Simple and flexible may be the advantages of open source hardware. Programming may be the difficulty of this project.

nctnico · « **Reply #14 on:** November 16, 2020, 02:46:27 pm »

Quote from: tom66 on November 16, 2020, 01:18:52 pm

IMO there is little benefit in using a separate FPGA + SoC because you lose that close coupling that the Zynq has. The ARM on the Zynq is directly writing registers on the FPGA side to influence acquisition, DMA behaviour etc. That would have to fit over a SPI or small digital link, which would constrain the system considerably.

That is where PCIexpress comes in. This gives you direct memory access both ways; in fact the FPGA could push the acquired data directly into the GPU memory area using PCIexpress.

tom66 · « **Reply #15 on:** November 16, 2020, 02:48:35 pm »

Quote from: nctnico on November 16, 2020, 02:46:27 pm

Quote from: tom66 on November 16, 2020, 01:18:52 pm
IMO there is little benefit in using a separate FPGA + SoC because you lose that close coupling that the Zynq has. The ARM on the Zynq is directly writing registers on the FPGA side to influence acquisition, DMA behaviour etc. That would have to fit over a SPI or small digital link, which would constrain the system considerably.
That is where PCIexpress comes in. This gives you direct memory access both ways; in fact the FPGA could push the acquired data directly into the GPU memory area using PCIexpress.

True, but the FPGA would still need to have some kind of management firmware on it for some parts, for instance setting up DMA transfer sizes and trigger settings. You could write that all in Verilog, but it becomes a real pain to debug. The balance of CPU for easy software tasks and HDL for easy hardware tasks makes the most sense, and some of this stuff is low-latency so you ideally want to keep it away from a non-realtime system like Linux. (The UltraScale SOC has a separate 600MHz dual ARM Cortex-R5 complex for realtime work - which is an interesting architecture.) But, having the ability for the Pi to write and read directly from memory space on the Zynq side would be really compelling. I may need to get the PCI-e reference manual and see what the interface and requirements look like there.

2N3055 · « **Reply #16 on:** November 16, 2020, 02:59:34 pm »

Very impressive work! I really hope you will succeed in your "quest"!

nctnico · « **Reply #17 on:** November 16, 2020, 03:14:02 pm »

Quote from: tom66 on November 16, 2020, 02:48:35 pm

Quote from: nctnico on November 16, 2020, 02:46:27 pm
Quote from: tom66 on November 16, 2020, 01:18:52 pm
IMO there is little benefit in using a separate FPGA + SoC because you lose that close coupling that the Zynq has. The ARM on the Zynq is directly writing registers on the FPGA side to influence acquisition, DMA behaviour etc. That would have to fit over a SPI or small digital link, which would constrain the system considerably.
That is where PCIexpress comes in. This gives you direct memory access both ways; in fact the FPGA could push the acquired data directly into the GPU memory area using PCIexpress.

True, but the FPGA would still need to have some kind of management firmware on it for some parts, for instance setting up DMA transfer sizes and trigger settings. You could write that all in Verilog, but it becomes a real pain to debug. The balance of CPU for easy software tasks and HDL for easy hardware tasks makes the most sense, and some of this stuff is low-latency so you ideally want to keep it away from a non-realtime system like Linux. (The UltraScale SOC has a separate 600MHz dual ARM Cortex-R5 complex for realtime work - which is an interesting architecture.) But, having the ability for the Pi to write and read directly from memory space on the Zynq side would be really compelling. I may need to get the PCI-e reference manual and see what the interface and requirements look like there.

The beauty of a PCI interface is that it basically does DMA transfers so Linux doesn't need to get in the way at all. The only thing the host CPU needs to do is setup the acquisition parameters and the FPGA can start pushing data into the GPU. Likely the GPU can signal the FPGA directly to steer the rate of the acquisitions. In the end a GPU has a massive amount of processing power compared to an ARM core for as long as you can do parallel tasks. I have made various realtime video processing projects with Linux and since all the data transfer is DMA based the host CPU is loaded by only a few percent. System memory bandwidth is something to be aware of though.

tom66 · « **Reply #18 on:** November 16, 2020, 06:26:49 pm »

Quote from: nctnico on November 16, 2020, 03:14:02 pm

The beauty of a PCI interface is that it basically does DMA transfers so Linux doesn't need to get in the way at all. The only thing the host CPU needs to do is setup the acquisition parameters and the FPGA can start pushing data into the GPU. Likely the GPU can signal the FPGA directly to steer the rate of the acquisitions. In the end a GPU has a massive amount of processing power compared to an ARM core for as long as you can do parallel tasks. I have made various realtime video processing projects with Linux and since all the data transfer is DMA based the host CPU is loaded by only a few percent. System memory bandwidth is something to be aware of though.

It's a fair point. There's still some acquisition control that the FPGA needs to be involved in, for instance sorting out the pre- and post-trigger stuff.

The current architecture roughly works as such:
- Pi configures acquisition mode (ex. 600 pts pre trigger, 600 pts post trigger, 1:1 input divide, 1 channel mode, 8 bits, 600 waves/slice, trigger is this type, delay by X clocks, etc.)
- Zynq acquires these waves into a rolling buffer - the buffer moves through memory space so there is history for any given acquisition (~25 seconds with current memory)
- Pi interrupts before next VSYNC to get packet of waves (which may be less than the 600 waves request)
- Transfer is made by the Zynq over CSI - Zynq corrects trigger positions and prepares DMA scatter-gather list then my CSI peripheral transfers ~2MB+ of data with no CPU intervention

There is close fusion between the Zynq ARM, FPGA fabric, and the Pi - and since the Pi is not hard real time (Zynq ARM is running baremetal) you'd need to be careful there with what latency you introduce into the system.

It would be nice if we could say to the Pi, e.g. find waveforms at this address, and when the Pi snoops in to the PCIe bus, the FPGA fabric intercepts the request and translates each waveform dynamically so we don't have to do the pre-trigger rotation on the Zynq ARM. Right now, the pre-trigger rotation is done by reading from the middle of the pre-trigger buffer, and then the start, then the post-trigger buffer (though I believe this could be simplified to two reads with some thought.) Perhaps it's possible by using the SCU on the Zynq - it's got a fairly sophisticated address translation engine. I'd like to avoid doing a read-rotate-writeback operation, as that triples the memory bandwidth requirements on the Zynq, and already 1GB/s of the memory bandwidth (~60%) is used just writing data from the ADC. The Zynq ARM has to execute code and read/write data from this same RAM, and although the 512KB L2 cache on the Zynq is generous, it's not perfect.

james_s · « **Reply #19 on:** November 16, 2020, 06:38:53 pm »

I loathe touchscreens, I tolerate one on my phone because of obvious constraints with the form factor but while I've owned several tablets I've yet to find a really good use case for one other than looking at datasheets. Can't stand them on most stuff and it annoys me whenever someone points to something and makes a finger smudge on my monitor. I could potentially make an exception in the case of a portable scope to have in addition to my bench scope although I think in the case of this project my interest is mostly academic, it's a fascinating project and an incredible achievement but not something I'm likely to spend money on. Roughly the same price will get me a 4 channel Siglent in a nice molded housing with real buttons and knobs and support, or a used TDS3000 that can be upgraded to 500MHz. That said, I've heard that Digikey pricing on FPGAs is hugely inflated so you may be able to drop the cost down substantially.

tom66 · « **Reply #20 on:** November 16, 2020, 06:41:11 pm »

Another challenge I am working on is how to do the rendering all on the FPGA.

This would free up the CPUs of the Pi and the GPU could be used for e.g. FFTs and 2D acceleration tasks.

The real challenge is - waveforms are stored linearly, but every X pixel on the display needs a different Y coordinate for a given wavevalue. So, it is not conducive to bulk write operations at all (e.g. AXI Burst). The 'trivial' improvement is to rotate the buffer 90 degrees (which is what my SW renderer does) so that your accesses tend to hit the same row at least and will be more likely to be sitting in the cache. But this is still a non-ideal solution. So the problem has to be broken down into tiles or slices. Zynq should read, say, 128 waveform values (fits nicely into a burst), and repeat for every waveform (with appropriate translations provided), write all the pixel values for that into BRAM (~12 bits x 128 x 1024, for a 1024 height canvas with 12 bits intensity grading = ~1.5Mbits pr about half of all available BlockRAMs), and write that back into DDR in order to get the most performance with burst operations used as much as possible.

It implies a fairly complex core and that's without considering multiple channels (which introduce even more complexity, because do you handle each as a separate buffer, or accumulate each with a 'key value' or ...?) The complexity here is that the ADC multiplexes samples, so in 1ch mode the samples are A0 .. A7, but in 2ch mode they are A0 B0 A1 B1 .. A3 B3 which means you need to think carefully about how you read and write data. You can try to unpack the data with small FIFOs on the acquisition side, but then you need to reassemble the data when you stream it out.

This is essentially solving the rotated polygon problem that GPU manufacturers solved 20 years ago, but solving it in a way that can fit in a relatively inexpensive FPGA and doing it at 100,000 waves/sec (60 Mpoints/sec plotted). And then doing it with vectors or dots between points - ArmWave is just dots for now though there is a prototype slower vector plotter I have written somewhere.

If you look at Rigol DS1000Z then you can see a fairly hefty SRAM chip attached to the FPGA, in addition to a regular DDR2/3 memory device. It is almost certain that the DDR memory is used just for waveform acquisition and that the waveform is rendered into the SRAM buffer and then streamed to the i.MX processor (possibly over the camera port like I am using.) Whether the FPGA colourises the camera data or whether Rigol use the i.MX's ISP block to do that is unknown to me. Rigol likely chose an expensive SRAM because it allows for true random access with minimal penalty in jumping to random addresses.

Current source code for ArmWave, the rendering engine presently used for anyone curious:
https://github.com/tom66/armwave/blob/master/armwave.c

This is about as fast as you will get an ARM rendering engine while using just one core and it has been profiled to death and back again. 4 cores would make it faster although some of the limitation does come from memory bus performance. It's at about 20 cycles per pixel plotted right now.

Fungus · « **Reply #21 on:** November 16, 2020, 06:56:14 pm »

Quote from: james_s on November 16, 2020, 06:38:53 pm

I loathe touchscreens, I tolerate one on my phone because of obvious constraints with the form factor but ... roughly the same price will get me a 4 channel Siglent in a nice molded housing with real buttons and knobs and support

Trust me: The knobs are OK for things like adjusting the timebase but a twisty, pushable, multifunction knob is not better for navigating menus, choosing options, etc.

eg. Look at the process of enabling a bunch of on-screen measurement on a Siglent. Does that seem like the best way?

https://youtu.be/gUz3KYp_5Tc?t=2925

tautech · « **Reply #22 on:** November 16, 2020, 07:15:49 pm »

Quote from: Fungus on November 16, 2020, 06:56:14 pm

Look at the process of enabling a bunch of on-screen measurement on a Siglent. Does that seem like the best way?

Best is accurate:
https://www.eevblog.com/forum/testgear/testing-dso-auto-measurements-accuracy-across-timebases/

sb42 · « **Reply #23 on:** November 16, 2020, 07:17:34 pm »

Quote from: Fungus on November 16, 2020, 06:56:14 pm

Quote from: james_s on November 16, 2020, 06:38:53 pm
I loathe touchscreens, I tolerate one on my phone because of obvious constraints with the form factor but ... roughly the same price will get me a 4 channel Siglent in a nice molded housing with real buttons and knobs and support

Trust me: The knobs are OK for things like adjusting the timebase but a twisty, pushable, multifunction knob is not better for navigating menus, choosing options, etc.

eg. Look at the process of enabling a bunch of on-screen measurement on a Siglent. Does that seem like the best way?

https://youtu.be/gUz3KYp_5Tc?t=2925

Also, with a USB port it might be possible to design something around a generic USB input interface like this one:
http://www.leobodnar.com/shop/index.php?main_page=product_info&cPath=94&products_id=300

nctnico · « **Reply #24 on:** November 16, 2020, 07:35:36 pm »

Quote from: tom66 on November 16, 2020, 06:41:11 pm

Another challenge I am working on is how to do the rendering all on the FPGA.

This would free up the CPUs of the Pi and the GPU could be used for e.g. FFTs and 2D acceleration tasks.

I'm not saying it can't be done but you also need to address (literally) shifting the dots so they match the trigger point.

IMHO you are at a cross road where you either choose for implementing a high update rate but poor analysis features and few people being able to work on it (coding HDL) versus a lower update rate and having lots of analysis features with many people being able to work on it (using OpenCL or even Python extensions). Another advantage of a software / GPU architecture is that you can update to higher performance hardware as well by simply taking the software to a different platform. Think about the NVidia Jetson / Xavier modules for example. A Jetson TX2 module with 128Gflops of GPU performance starts at $400. More GPU power automatically translates to a higher update rate. This is also how the Lecroy software works; look at how Lecroy's Wavepro oscilloscopes work and how a better CPU and GPU drastically improve the performance.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: A High-Performance Open Source Oscilloscope: development log & future ideas (Read 69948 times)

Share me