I was talking that SBC (such as raspberry pi/orange pi/odroid-m1) cannot do it with gigabit Ethernet or USB3 port.
Yet, even my old Odroid HC1 can do it. Other than RPi, the SBC's I've mentioned above obviously can.
But you're said that SBC also can do it and even can do it better than powerful expensive PC.
When that SBC is not running wasteful stuff like GUI and unneeded crap like various service daemons, and has a suitable SoC with the requisite hardware peripherals, sure.
For example, OV5647 is a MIPI CSI camera module, that can produce 1920×1080×30×10 = 622,080,000 bits per second, or just under 78 Mbytes/sec. Yet, these SBCs can record and display the camera output at that resolution (1080p30) just fine, and you can even do other stuff at the same time.
How is that possible? These SBCs have many periperals that can transfer data to anywhere in memory without the intervention of the actual processor cores. They use pretty smart DMA controllers with priorities et cetera, so that the only real impact is on the memory bandwidth – but on the better SBCs, a couple of gigabits per second of DMA transfers doesn't yet starve anything. (The processor cores also have caches, of course.)
It is better to have bursty transfers at higher than necessary clocks, rather than require well-timed byte/word transfers, so that the DMA controllers and caches use the available bandwidth to its fullest, without competing with each other too much.
This is how even slooow boards like
MikroTik RBM33G (800 MHz
32-bit MIPS processor with just 256 MB of RAM) can stuff a gigabit ethernet pipe full of data, as long as there are no resends or collisions, and the data is already in memory buffers, because essentially the board itself is "an accelerated Ethernet card".
The beforementioned EZ-USB FX2LA, which has a puny 48 MHz 8051 core, is well known to reach quite close to the theoretical maximum transfer rates, when using the GPIF feature. GPIF, too, is a hardware peripheral, which waits for a specific clock edge, then latches 8/16/24/32 bits, and buffers the data, letting the USB buffer side know how much data it has. Whenever there is enough data and the USB buffer side is ready to do so –– all this without the 8051 processor core doing anything! ––, the USB buffer side initiates and transfers the data using USB bulk or isochronous transfers. When ACKed by the host, the USB buffer side discards the data (letting future GPIF transfers overwrite it), and round it goes. The actual 8051 core just waits for event flags (at certain memory addresses, updated by the GPIF peripheral and the USB buffer manager) and updates the configuration when necessary, it does not participate in moving the data around.
In SBCs with MIPI CSI and OV5647 support at maximum frame rates (the driver is
built-in to the vanilla Linux kernel, by the way), when you display or compress the camera video, it is not done by the processor core computing the details. Most of these have a video compressor subsystem, which is simply told whenever a new frame has been received, and it does (most) of the work by itself; there is typically a bit of housekeeping work for each frame, buffer maintenance and so on. When displayed on screen, the OpenGL or OpenGL ES display accelerator can directly display the camera frames on screen, with scaling, as long as it supports the color format. In some cases, like Raspberry Pi's (which does support OV5647), the display accelerator or media processor manages the camera frame transfers almost autonomously, with one of the core processors just doing some bookkeeping work (keeping track of memory) once per frame.
Oscilloscope uses FPGA for ADC data processing and don't use gigabit ethernet, USB3 or SPI to transfer data between FPGA and CPU
Yet, the underlying transport, typically 2, 4, 8, or 16 LVDS lanes (sometimes at double data rate, at both edges of the clock),
is the same.
The only difference is that in SoCs, when an appropriate bus (USB, MIPI, PCIe, etc. is used, the transceivers are connected to a DMA engine.
In FPGAs, I believe you can interconnect the signals in a much more complex way, including
both storing the data to a RAM buffer, and processing it using some optimized ALU or triggering detector.
So, the oscilloscope CPU get already processed data flow with reduced speed.
Or rather, the
CPU never touches most of the data at all, just like the EZ-USB FX2 and its 48 MHz 8051 core.
Yes, you can do fast hardware transfer directly to memory or SSD by using their hardware bus directly, which don't involve CPU. But we're talking about attempt to use SBC through it's standard ports like gigabit ethernet, USB3 or SPI on it's GPIO connector.
But you see, USB3, MIPI CSI, PCIe, and all those buses I've suggested could be used,
do fast hardware transfers directly from memory, without involving the CPU! That's the entire idea.
If we tried to use plain GPIO, for example dedicating one core for just waiting for the specific clock edge, and then latching the data pin states, I do not believe we could achieve the necessary bandwidth.
This is also why number crunching and things like intensive JavaScript on web pages (I like to use the browser-based
EasyEDA editor for my board designs) feels so much slower on an SBC than on a desktop or laptop: the core processors has to manipulate all that data and interpret the Javascript, and none of those hardware peripherals can help much. (Well, the media processors, capable of encoding and decoding h.264 at 1080p60 or better and only spending
a watt or two, are definitely useful for video playing. And OpenGL ES graphics do work well when the system is designed to exploit them – just look at your cellphone. Both Android and iPhone/iOS heavily rely on OpenGL ES for the display graphics!)