Author Topic: Simplest way for 75MS/s 14bit recordings to SSD? (Read 3763 times)

Marsupilami · « **Reply #25 on:** June 03, 2023, 12:45:07 am »

Nominal Animal · « **Reply #26 on:** June 03, 2023, 04:53:24 am »

It seems to me that EZ-USB FX3 interfaced to a parallel ADC would yield an USB 3 solution – except that EZ-USB FX3 [fixed:] EVM (CYUSB3KIT-001) is no longer manufactured, and the EZ-USB FX3 Explorer kit (CYUSB3KIT-003) is not available anywhere, possibly no longer manufactured.
It has a dedicated GPIF-II, that can run at 100 MHz, and supports 8-, 16-, 24-, or 32-bit parallel inputs. [end fix]

While the required CYUSB3013-BZXC chip itself is still manufactured, it's a 121-ball BGA with 0.8mm pitch; way outside my electronics skill range.

Oh, and Orange Pi 5 Plus looks extremely interesting, what with both PCIe 3.0 M.2, 2×2.5GbE, HDMI In + 2×Out, and so on.

Quote from: radiolistener on June 02, 2023, 10:39:14 pm

Can you please show me example of high speed ADC data transfer implementation for a raspberry pi

No, because Raspberry Pis are not suitable for this.

Like I said, the problem is getting the ADC data via some kind of a parallel bus and DMA to RAM.

BeagleBone Black (based on TI AM335x "Sitara" processors from around 2011), according to TI, can do 14bit parallel input clocked at 50 Msps. Unfortunately, it is so old it doesn't have USB3 nor PCIe M.2 for fast enough storage, so just like 'Pis, BBB isn't suitable here.

Quote from: radiolistener on June 02, 2023, 10:39:14 pm

The only implementation that I know which allows 20-40 MS/s transfer through USB3 requires top level desktop machine with i5/i7 CPU...

Most scientific measurement appliances and even oscilloscopes run Linux on ARM cores; so do many NASes and miniservers(which usually use USB3-to-SATA interfaces like JMS578) that can definitely do several hundred megabytes per second locally, limited by the SATA SSD's sustained I/O bandwidth.

Just because you don't know them doesn't mean they don't exist.

Analog Devices even provides Linux kernel drivers for controlling their high-speed ADCs like AD9467 (16-bit 200MSPS ADC, driver source) and many others running on Zync SoC FPGAs (entire AD tree) like ZedBoard, easily adapted to ones own use. (The iio in the path refers to Industrial I/O subsystem in Linux.) Of course, the price for these particular ones is painful.

KE5FX · « **Reply #27 on:** June 03, 2023, 05:09:45 am »

Quote from: Nominal Animal on June 03, 2023, 04:53:24 am

... except that EZ-USB FX3 is no longer manufactured.

Wait, what?

That would be more or less apocalyptic.

Nominal Animal · « **Reply #28 on:** June 03, 2023, 06:01:29 am »

Quote from: KE5FX on June 03, 2023, 05:09:45 am

Quote from: Nominal Animal on June 03, 2023, 04:53:24 am
... except that EZ-USB FX3 is no longer manufactured.
Wait, what?

EVM! The EVM, CYUSB3KIT-001, is no longer manufactured! Sorry.
And CYUSB3KIT-003 is zero-stock everywhere, including Infineon. [*]

The CYUSB3KIT-001 was even more an EVM, whereas -003 is the Explorer Kit. CYUSB3KIT-003 is/was/would be only around twice the cost of a single CYUSB3013-BZXC in singles for us hobbyists. You could even use it as a 100 MHz logic analyzer with Sigrok (using the same USB3 bandwidth as 16bit 100MS/s ADC). The GPIF II parallel-to-serial bridge would be configured using GPIF II designer. According to AN86947, it shouldn't have any problems with a 100 Msps 16-bit GPIF II externally-clocked parallel-to-USB data stream.

I only have the EZ-USB FX2LP one myself, which is similar, but limited to USB 2.0 High Speed (480 Mbit/s).

[*] Except Farnell/Element14 Singapore (and from there to Australia) has some in CYUSB3KIT-003 in stock, of course.

radiolistener · « **Reply #29 on:** June 03, 2023, 07:01:40 am »

Quote from: Nominal Animal on June 03, 2023, 04:53:24 am

No, because Raspberry Pis are not suitable for this.

Like I said, the problem is getting the ADC data via some kind of a parallel bus and DMA to RAM.

Stop. But I already said that such transfer is possible with direct write into RAM with DMA, if your external device has an access to internal memory bus. I was talking that SBC (such as raspberry pi/orange pi/odroid-m1) cannot do it with gigabit Ethernet or USB3 port. While modern powerful desktop PC can do it. But you're said that SBC also can do it and even can do it better than powerful expensive PC. Isn't it?

Quote from: Nominal Animal on June 03, 2023, 04:53:24 am

Most scientific measurement appliances and even oscilloscopes run Linux on ARM cores; so do many NASes and miniservers(which usually use USB3-to-SATA interfaces like JMS578) that can definitely do several hundred megabytes per second locally, limited by the SATA SSD's sustained I/O bandwidth.

Just because you don't know them doesn't mean they don't exist.

Oscilloscope uses FPGA for ADC data processing and don't use gigabit ethernet, USB3 or SPI to transfer data between FPGA and CPU, they use high speed transfer through internal high speed bus with using DMA to reduce CPU load. And oscilloscope CPU doesn't deal with high speed stream from ADC, it is processed on FPGA. So, the oscilloscope CPU get already processed data flow with reduced speed.

Yes, you can do fast hardware transfer directly to memory or SSD by using their hardware bus directly, which don't involve CPU. But we're talking about attempt to use SBC through it's standard ports like gigabit ethernet, USB3 or SPI on it's GPIO connector.

radiolistener · « **Reply #30 on:** June 03, 2023, 07:22:16 am »

Quote from: Nominal Animal on June 03, 2023, 06:01:29 am

And CYUSB3KIT-003 is zero-stock everywhere, including Infineon.

Yes, it's problematic to buy high speed USB3 controller, they are expensive or just don't available.

Quote from: Nominal Animal on June 03, 2023, 06:01:29 am

I only have the EZ-USB FX2LP one myself, which is similar, but limited to USB 2.0 High Speed (480 Mbit/s).

I also have FX2LP, but it's based on 8051 based MCU with paginated memory and kludges, and it don't have JTAG/SWD for in-circuit debugging, so it's headache to debug some code on it. I tried to use it but after STM32, it looks like bullying, so I just threw it away

If you're want to use 480 Mbps of USB2, it's more easy to use FT232H.

Nominal Animal · « **Reply #31 on:** June 03, 2023, 07:52:47 am »

Quote from: radiolistener on June 03, 2023, 07:22:16 am

I also have FX2LP, but it's based on 8051 based MCU with paginated memory and kludges, and it don't have JTAG/SWD for in-circuit debugging, so it's headache to debug some code on it. I tried to use it but after STM32, it looks like bullying, so I just threw it away

The C (sdcc-compiled) sources to the fx2lafw firmware Sigrok uses for it are here. It too has (and uses) the GPIF for (slave) parallel input using 8- or 16-bit inputs, so the 8051 code really only sets up the USB endpoints and GPIF, and then just spins in a busy loop polling for events.

It isn't nice like Cortex-M[01347] or even AVR, for sure, but using sdcc to compile the C sources to 8051 seems okay to me. The 8051 core does not touch the data flowing, so it shouldn't have to be too keenly optimized to work well.

The EZ-USB FX3 isn't that much different. Instead of 8051, you have an ARM9 running some kind of RTOS kernel, and lots more speed and bandwidth.

I really only like them for the parallel-to-USB interfaces, plus uses as a logic analyzer.

Marsupilami · « **Reply #32 on:** June 03, 2023, 08:10:29 am »

Quote from: Marsupilami on June 03, 2023, 12:45:07 am

https://www.ebay.com/itm/166033310781
?

In bird culture this might be considered a d*ck move, but while you guys were bickering here I wanted to recommend another ebay find and I ended up buying it just for the lolz.
https://www.ebay.com/itm/125307494774
8bit only but if I understand it correctly it can do 3Gsps on one channel.
And it's $99 lol. Manufacturer lists it for almost 9 grand.
Moral of the story, go ebay hunting.

Nominal Animal · « **Reply #33 on:** June 03, 2023, 09:48:51 am »

Quote from: radiolistener on June 03, 2023, 07:01:40 am

I was talking that SBC (such as raspberry pi/orange pi/odroid-m1) cannot do it with gigabit Ethernet or USB3 port.

Yet, even my old Odroid HC1 can do it. Other than RPi, the SBC's I've mentioned above obviously can.

Quote from: radiolistener on June 03, 2023, 07:01:40 am

But you're said that SBC also can do it and even can do it better than powerful expensive PC.

When that SBC is not running wasteful stuff like GUI and unneeded crap like various service daemons, and has a suitable SoC with the requisite hardware peripherals, sure.

For example, OV5647 is a MIPI CSI camera module, that can produce 1920×1080×30×10 = 622,080,000 bits per second, or just under 78 Mbytes/sec. Yet, these SBCs can record and display the camera output at that resolution (1080p30) just fine, and you can even do other stuff at the same time.

How is that possible? These SBCs have many periperals that can transfer data to anywhere in memory without the intervention of the actual processor cores. They use pretty smart DMA controllers with priorities et cetera, so that the only real impact is on the memory bandwidth – but on the better SBCs, a couple of gigabits per second of DMA transfers doesn't yet starve anything. (The processor cores also have caches, of course.)
It is better to have bursty transfers at higher than necessary clocks, rather than require well-timed byte/word transfers, so that the DMA controllers and caches use the available bandwidth to its fullest, without competing with each other too much.

This is how even slooow boards like MikroTik RBM33G (800 MHz 32-bit MIPS processor with just 256 MB of RAM) can stuff a gigabit ethernet pipe full of data, as long as there are no resends or collisions, and the data is already in memory buffers, because essentially the board itself is "an accelerated Ethernet card".

The beforementioned EZ-USB FX2LA, which has a puny 48 MHz 8051 core, is well known to reach quite close to the theoretical maximum transfer rates, when using the GPIF feature. GPIF, too, is a hardware peripheral, which waits for a specific clock edge, then latches 8/16/24/32 bits, and buffers the data, letting the USB buffer side know how much data it has. Whenever there is enough data and the USB buffer side is ready to do so –– all this without the 8051 processor core doing anything! ––, the USB buffer side initiates and transfers the data using USB bulk or isochronous transfers. When ACKed by the host, the USB buffer side discards the data (letting future GPIF transfers overwrite it), and round it goes. The actual 8051 core just waits for event flags (at certain memory addresses, updated by the GPIF peripheral and the USB buffer manager) and updates the configuration when necessary, it does not participate in moving the data around.

In SBCs with MIPI CSI and OV5647 support at maximum frame rates (the driver is built-in to the vanilla Linux kernel, by the way), when you display or compress the camera video, it is not done by the processor core computing the details. Most of these have a video compressor subsystem, which is simply told whenever a new frame has been received, and it does (most) of the work by itself; there is typically a bit of housekeeping work for each frame, buffer maintenance and so on. When displayed on screen, the OpenGL or OpenGL ES display accelerator can directly display the camera frames on screen, with scaling, as long as it supports the color format. In some cases, like Raspberry Pi's (which does support OV5647), the display accelerator or media processor manages the camera frame transfers almost autonomously, with one of the core processors just doing some bookkeeping work (keeping track of memory) once per frame.

Quote from: radiolistener on June 03, 2023, 07:01:40 am

Oscilloscope uses FPGA for ADC data processing and don't use gigabit ethernet, USB3 or SPI to transfer data between FPGA and CPU

Yet, the underlying transport, typically 2, 4, 8, or 16 LVDS lanes (sometimes at double data rate, at both edges of the clock), is the same.

The only difference is that in SoCs, when an appropriate bus (USB, MIPI, PCIe, etc. is used, the transceivers are connected to a DMA engine.
In FPGAs, I believe you can interconnect the signals in a much more complex way, including both storing the data to a RAM buffer, and processing it using some optimized ALU or triggering detector.

Quote from: radiolistener on June 03, 2023, 07:01:40 am

So, the oscilloscope CPU get already processed data flow with reduced speed.

Or rather, the CPU never touches most of the data at all, just like the EZ-USB FX2 and its 48 MHz 8051 core.

Quote from: radiolistener on June 03, 2023, 07:01:40 am

Yes, you can do fast hardware transfer directly to memory or SSD by using their hardware bus directly, which don't involve CPU. But we're talking about attempt to use SBC through it's standard ports like gigabit ethernet, USB3 or SPI on it's GPIO connector.

But you see, USB3, MIPI CSI, PCIe, and all those buses I've suggested could be used, do fast hardware transfers directly from memory, without involving the CPU! That's the entire idea.

If we tried to use plain GPIO, for example dedicating one core for just waiting for the specific clock edge, and then latching the data pin states, I do not believe we could achieve the necessary bandwidth.

This is also why number crunching and things like intensive JavaScript on web pages (I like to use the browser-based EasyEDA editor for my board designs) feels so much slower on an SBC than on a desktop or laptop: the core processors has to manipulate all that data and interpret the Javascript, and none of those hardware peripherals can help much. (Well, the media processors, capable of encoding and decoding h.264 at 1080p60 or better and only spending a watt or two, are definitely useful for video playing. And OpenGL ES graphics do work well when the system is designed to exploit them – just look at your cellphone. Both Android and iPhone/iOS heavily rely on OpenGL ES for the display graphics!)

sparkydog · « **Reply #34 on:** June 05, 2023, 09:29:14 pm »

Quote from: radiolistener on June 02, 2023, 10:39:14 pm

If CPU doing some non critical stuff [...]

This, right here, is why your skepticism is unwarranted and probably¹ wrong. Comparing a desktop environment to a real-time computing environment is apples and oranges. Network behavior on a multi-client network is similarly not comparable with behavior on a dedicated, point-to-point link. There's no such thing as a Windoze machine that isn't doing all manner of non-critical stuff all the time. A properly configured SBC only does non-critical stuff when it isn't doing critical stuff... if it does at all. Note that "non-critical stuff" might include running a task scheduler.

(¹ I won't say "definitely" because I haven't done any tests myself. However, I'm inclined to believe the other posters.)

Quote from: radiolistener on June 02, 2023, 10:39:14 pm

When you're trying to transfer realtime continuous stream at constant speed which is needed to transfer data from high speed ADC, it leads to a problems with packet drops on PC side. This is because such a mode requires guaranteed OS response within pretty low time interval. And higher speed means that this interval should be lower.

...which is exactly why you use a real-time kernel and not Windoze when you need this sort of performance. The entire reason such kernels exist is to guarantee low-latency response.

Basically, you're complaining that your 300 HP minivan (which is bloated with luxury creature comforts and other bits and baubles that don't help it go fast) isn't very impressive on the racetrack and concluding on that basis that the 200 HP racing bike (that's been ruthlessly stripped of anything that doesn't contribute to racetrack performance) must be worse than your minivan.

240RS · « **Reply #35 on:** June 15, 2023, 01:00:19 pm »

Could this RX888 mk2 “sdr” be an option?

https://dc4ku.com/.cm4all/uproc.php/0/SDR%20-%20TEST-BERICHTE/RX888_english.pdf?cdp=a&_=1806ad658e7

It seems to have been built to just stream baseband over usb3. Costs $180.

240RS · « **Reply #36 on:** June 15, 2023, 02:01:11 pm »

Found the schematics and the idea is very simple:
- 16 bit 105MSPS ADC (via simple filtering connected to input SMA
- output lines of ADC connected to this board https://www.infineon.com/cms/en/product/evaluation-boards/cyusb3kit-003/

Sounds, very, very simple this way.

radiolistener · « **Reply #37 on:** June 15, 2023, 11:36:59 pm »

Quote from: 240RS on June 15, 2023, 02:01:11 pm

Found the schematics and the idea is very simple:
- 16 bit 105MSPS ADC (via simple filtering connected to input SMA
- output lines of ADC connected to this board https://www.infineon.com/cms/en/product/evaluation-boards/cyusb3kit-003/

Sounds, very, very simple this way.

Yes, but it requires very powerful desktop PC with a top CPU like i7 in order to use such bandwidth.

This is the main con of that Chinese RX888 receiver, it requires a lot of CPU power. It doesn't have onboard FPGA to do digital frequency down conversion in hardware before transfer stream to PC (to reduce required transfer speed and CPU load).

240RS · « **Reply #38 on:** June 16, 2023, 01:46:47 pm »

Quote from: Nominal Animal on June 03, 2023, 04:53:24 am

It seems to me that EZ-USB FX3 interfaced to a parallel ADC would yield an USB 3 solution – except that EZ-USB FX3 [fixed:] EVM (CYUSB3KIT-001) is no longer manufactured, and the EZ-USB FX3 Explorer kit (CYUSB3KIT-003) is not available anywhere, possibly no longer manufactured.
It has a dedicated GPIF-II, that can run at 100 MHz, and supports 8-, 16-, 24-, or 32-bit parallel inputs. [end fix]

Ordered the FX3 003 kit as part of RX888 mk2 (china, 180$ eBay). It has a 130MSPS 16bit ADC and a clock circuit. Is quite some value for money then. The 001 kits can be had on ebay (150$).

Received a Leo Bodnar clock generator today since the RX888 clock has horrible specs (40ps jitter).

Have a 4 year old i7 desktop. Was top of the bill then (according to the company building it for me then :-)).

Next step will then be understanding how to configure GPIF2 for the FX3 to do its work. The ADC puts 16 bits at the input of the FX3 and has a clock out (“data ready”) hooked up to the FX3 as well. Anybody a keyword for me to google to find a suitable GPIF2 example? I just hope the FX3 can just push data over USB, but maybe there needs to be some pull from the PC. Never programmed with USB. What would be the “hello world” sample of an external data source and a PC as sink?

Then on the PC side I want to write some simple C code to handle USB (found ezusb.c) and just store the data to disk (or memory if not fast enough). Hints are appreciated as well. I understand that RX888 users complain about PC load, but isn’t that because they are using SDR tools that do a lot of work?

If any of the SBC’s mentioned could handle USB3 and are a preferred method over Windows on an i7, please let me know as well. Maybe Linux is better suited for single tasks than Windows. Having said that, an SBC with SSH on a laptop is so much more convenient.

All info you all have provided os so useful. It takes some time to grow into this, but this is encouraging.

radiolistener · « **Reply #39 on:** June 19, 2023, 07:24:32 am »

If you're planning to test it with SBC like RPI4, please let me know what a max speed you can reach on it.

240RS · « **Reply #40 on:** June 19, 2023, 08:43:18 am »

Quote from: radiolistener on June 19, 2023, 07:24:32 am

If you're planning to test it with SBC like RPI4, please let me know what a max speed you can reach on it.

I will after I get it working on PC. RPi is the only SBC I know how to work with by the way. The other dropped names I have never seen except beaglebone.

I was wrong about the RX888mk2 having a FX3 dev board. The original design had, this one is somewhat more advanced. It has the chip integrated to the pcb. From the radio enthusiasts using it, it is a fine setup for 16bit 64MSPS. The ADC can go to 130 and the FX3 to 100 at 32bits. So my 75 MBPS will be fine. Device will take some time to arrive. I have looked up my PC spec and it is an i7 9700 should be ok if I can get Windows not to do stuff I do not want. That will be a challenge.

Next up will be to design a test method to verify I did not drop samples.

Nominal Animal · « **Reply #41 on:** June 19, 2023, 09:33:31 am »

RPi 4 is definitely not suitable for this! Its internal bus architecture has bottlenecks that will limits its performance, and I'm not sure it can reach the desired rates even in optimal situations!

I would definitely start with a Linux SBC like Odroid M1 with a PCIe 3.0 M.2 two-lane SSD, perhaps a Samsung 980 500 GB or 1 TB one. This ensures you can at least capture the data from USB 3 continuously at 150 MB/s for almost an hour or longer, using a simple capture application or library written in C or C++ (that just reads the bulk USB transfers, and saves them to a file, using large enough buffers); M1 has been benchmarked to easily maintain 1100+ MB/s to a good quality PCIe M.2 SSD. (Pi CM4 tops at 188 MB/s.)

With the GbE interface, transferring the raw data to another machine over Ethernet should take less than twice the duration of the capture, too. Or, you can use an external SATA SSD (or USB 3-connected external SSD), and use that to transfer the data between machines.

Odroid M1 processor is RockChip RK3568, with four 2 GHz Cortex-A55 cores, a separate Mali G52-2EE OpenGL ES graphics and media controller (does both video decryption and encryption in hardware), and a dedicated NPU (rknpu2). The NPU is designed to do Caffe/TensorFlow/TensorFlow Lite/ONNX/Darknet/PyTorch neural network and OpenCV processing at a very high rate (0.8 TOPS). The NPU SDK is a 3 GB download, and since I don't have the board myself (yet), I cannot say whether it would be useful for processing the 14/16-bit data or not. I just don't know if the hardware ops it can do are useful for e.g. FFT, because for example the matrix multiplication example (the closest I found to FFT) uses 8-bit integer sources (analogous to samples and coefficients) with 32-bit integer results. 16×16-bit multiplication to 32-bit results and 32×32-bit multiplication accumulating to 64-bit integers would be pretty optimal for FFT'ing this data.

Using two or three A55 cores for FFT processing (using your own code) should however allow you to analyse the data on the RK3568 itself (even during ongoing capture); I just cannot specify the exact rate (FFTs per second for any specific window size), and whether a half-overlap windowed FFT is possible to do in realtime for any specific sample window size. The instruction set architecture is 64-bit ARMv8.2-A, on which doing an FFT on real integer (or fixed point) data, each 32×32-bit multiply-add takes only a single cycle (but each data load and store takes additional cycles, limiting the bandwidth to something like one multiply-add per two to four cycles or so, sustained). It'd definitely need a real-data limited-integer-input FFT implementation to take maximum advantage of the hardware; there are some libraries optimized for Neon, but they tend to use floating-point data only.

You could also make that into a GUI, for example displaying the FFT (complex amplitudes) using OpenGL ES, with proper antialiasing and hardware acceleration, at realtime (but sampled, i.e. windowed at regular intervals, skipping over intervening data), converting it into an appliance.
Stored sequences would be easy/easier to display, since there would not be a specific realtime requirement; a controllable timeline and timestamp ought to suffice for examining the data. (Doing this right also involves using cpusets, nice, and ionice in Linux, to ensure the processing and I/O priorities are correct, and the data capture and storage is prioritized over the GUI.)

You can do the GUI even without X, either on top of plain DRI OpenGL ES as a full-screen application (although text and fonts are a bit of a pain then), or using e.g. Qt on top of raw framebuffer.

None of what I've described in this thread is an 'off-the shelf' or even 'combine existing open-source software' solution; I am assuming OP is willing to experiment and develop the necessary software (using mostly, but not exclusively, existing libraries) themselves.

240RS · « **Reply #42 on:** June 19, 2023, 12:26:54 pm »

Very helpful to explain Odroid specs and hardware setup. Thanks! The Samsung 500GB SSD is easy to find and fits like this:


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Simplest way for 75MS/s 14bit recordings to SSD? (Read 3763 times)

Marsupilami

Re: Simplest way for 75MS/s 14bit recordings to SSD?

Nominal Animal

Re: Simplest way for 75MS/s 14bit recordings to SSD?

KE5FX

Re: Simplest way for 75MS/s 14bit recordings to SSD?

Nominal Animal

Re: Simplest way for 75MS/s 14bit recordings to SSD?

radiolistener

Re: Simplest way for 75MS/s 14bit recordings to SSD?

radiolistener

Re: Simplest way for 75MS/s 14bit recordings to SSD?

Nominal Animal

Re: Simplest way for 75MS/s 14bit recordings to SSD?

Marsupilami

Re: Simplest way for 75MS/s 14bit recordings to SSD?

Nominal Animal

Re: Simplest way for 75MS/s 14bit recordings to SSD?

sparkydog

Re: Simplest way for 75MS/s 14bit recordings to SSD?

240RS

Re: Simplest way for 75MS/s 14bit recordings to SSD?

240RS

Re: Simplest way for 75MS/s 14bit recordings to SSD?

radiolistener

Re: Simplest way for 75MS/s 14bit recordings to SSD?

240RS

Re: Simplest way for 75MS/s 14bit recordings to SSD?

radiolistener

Re: Simplest way for 75MS/s 14bit recordings to SSD?

240RS

Re: Simplest way for 75MS/s 14bit recordings to SSD?

Nominal Animal

Re: Simplest way for 75MS/s 14bit recordings to SSD?

240RS

Re: Simplest way for 75MS/s 14bit recordings to SSD?

Share me