Author Topic: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC (Read 13118 times)

David Hess · « **Reply #50 on:** July 06, 2018, 12:08:15 am »

Quote from: bson on July 05, 2018, 11:42:54 pm

Quote from: David Hess on July 05, 2018, 01:11:36 pm
That LeCroy document agrees with what I wrote and the various posts from Wurstunhund but I wonder if they did it differently in the past. I did not get into that much detail but breaking the acquisition record up into cache sized chunks is an obvious thing to do to maximize processor throughput which is very important if the processor is doing the heavy lifting which is LeCroy's thing.

It seems they make Intel CPUs use a portion of the L1 or possibly L2 cache as fast SRAM and break the acquisition record into chunks intended to fit a portion of the cache. They then apply the entire processing chain to each chunk as it sits in the cache before flushing it back out to memory. I don't know if there are any special cache controls on Intel CPUs that makes this particularly easy, but it can always be accomplished with very careful data layout (assuming code has a separate L1 cache) based on the same underlying principles as cache coloring to avoid cache contention.

During startup, CPUs use their cache as memory until the external memory is configured and available. I do not know if Intel supports it but some CPUs allow cache lines to be locked in place which can be used to ensure real time performance.

As a practical matter, I do not think it is required in this case. The cache has enough association to avoid evictions if the programming is done correctly. Processing in cache sized blocks is standard practice for best performance for the reason LeCroy outlined; minimizing excessive access to main memory is important for maximum performance.

bson · « **Reply #51 on:** July 06, 2018, 12:16:55 am »

X-Stream definitely relies on a GPU to render the actual trace, so the CPU itself is mainly used for record processing. This can be disabled, forcing a software trace render, with a rather dramatic drop in the display update rate. So it's a bit of a myth that "everything is done in software."

Any x86/x64 desktop processor is going to have an L1 cache bandwidth that runs circles around FPGA internal memory blocks. I mean, as in the cheapest desktop or even mobile x64 processor is going to outperform the ballsiest FPGA out there on internal memory bandwidth. This is not surprising really since the desktop processor isn't burdened with a general-purpose signal routing fabric.

So, going only on the observation that LeCroy's scheme does not generally outperform FPGAs other than at price point, it has to be about more than memory bandwidth. The fact that you can build really high tap count filters and other large discrete processing elements in an FPGA has to be an advantage. You can make most if not all measurements at full bandwidth in parallel, and I'd venture to suggest the high degree of computational concurrency to a large extent makes up for the deficiency in memory bandwidth.

It also wouldn't surprise me if a run of the mill mobile or desktop processor also doesn't effectively have a faster DDR bus but I guess it depends heavily on what IP you license for the FPGA. (Again moving the price point.)

Berni · « **Reply #52 on:** July 06, 2018, 05:24:12 am »

Well FPGA memory blocks on large chips can be very fast. Each individual memory block is usually not very fast at all, and if it was fast the FPGA fabric usually doesn't handle signals above 300MHz all that well unless you have a fancy chip. Where they get the speed is that there is a lot of them. A midrange FPGA might have 100 of these blocks and each can usually do a simultaneous write and read of 36bit at around 300MHz so that is about 1200MB/s per port or 2400MB/s for both ports combined. So when you put together 100 of those you get 240 GB/s of memory throughput. All of this throughput is random access with 1 clock cycle latency.

But the kicker is that this only works when the memory is used in a distributed way where each block is connected to its own processing logic that hammers it with traffic. If you combined them into a single large block of memory where you can write and read anywhere you want then it would slow down to pretty much the slow performance of a single memory block.

DDR3 memory also generally does not run very fast on FPGAs. The circuitry in the IO pins has limits to how fast it can go. Chips with hardcore IP memory controllers can go faster, but they still don't run at the ridiculous speeds that DDR3 sticks in PCs can go.

Oh and CPU cache memory is often configured to be just RAM on startup even in a lot of ARMs or DSPs. Some can even be then reconfigured to only use half of it as actual cache and keep using the other half as RAM (Useful for time critical interrupt routines). But a lot of speed optimization on x86 is avoiding cache misses, the code gets designed in a way that forces the CPU to keep as little in cache as possible and data structures might be designed to fit in to cache pages neatly. You don't need to keep all your work in L1 cache for it to be fast, you just need to flush the cache rarely enough that the slow memory downstream of it can finish the operation before you need the next flush. Hyperthreading can also help by keeping the CPU busy with the other thread while this one waits for the cache miss data to get there.

Mr Nutts · « **Reply #53 on:** July 06, 2018, 12:12:58 pm »

Wow, this is very interesting stuff and I'm learning a lot! This is so cool!

Quote from: David Hess on July 05, 2018, 10:49:07 pm

The problem is what if I do not know what I am looking for? It is not enough to trigger on everything which is not the expected signal when I do not necessarily even know what the expected signal is in enough detail. And what if I am looking for more than one thing?

As long as I know what my signal is supposed to be and its repetitive then that's easy. I set up the triggers for glitches and runts and sequence mode and wait if the scope triggers on it. I tend to leave it alone for a while and do something else, and when I come back I can look at the history to see when each anomaly occurred. It's like a slide show.

I could also use parameter masks but I haven't tried this yet.

If the signal isn't repetitive then I can still trigger for anomalies but it's more complicated (and I haven't completely figured out how to do that correctly yet).

If I don't know the signal then I just poke it with the scope probe to see what it is, but that would be the same with any scope. Of course I first have to find out what the signal is supposed to be before I can look for anomalies.

I'm still learning how to use all those functions in these old lecroy scopes. And these scopes are old, I would bet that newer lecroy scopes have even more functions to capture stuff. But I think my Agilent 8064 could do a lot of this as well if I can get the lost options to work.

Quote

Tektronix actually agrees with LeCroy's view; use DPO mode until you know what to trigger on and then setup the trigger to capture exactly what you need.

That works for anomalies that are frequent, but if it's very rare then it would take a very long time to capture it in persistence mode. And if I see it on persistence eventually I still have to setup the trigger and capture it again to get any time resolution, i.e. what is the interval of occurrence.

But there are many ways to skin the dog I guess.

Quote

That makes sense because the LeCroy has much greater blind time if it cannot trigger on the signal; at least I assume that is the case based on the LeCroy documentation that I checked. These two styles of DSO have different strengths and weaknesses so they must be used in different ways to make the most of their capabilities.

The update rate of my LT574M is in the region of a few hundred to a thousand updates per second. But if I go to RIS mode it jumps up to 20'000 updates per second which is a bit weird. Maybe I should try to capture a rare anomaly in RIS mode?

I have no idea how that compares with contemporary scopes from the same era.

Quote from: bson on July 06, 2018, 12:16:55 am

X-Stream definitely relies on a GPU to render the actual trace, so the CPU itself is mainly used for record processing. This can be disabled, forcing a software trace render, with a rather dramatic drop in the display update rate. So it's a bit of a myth that "everything is done in software."

Is X-Stream what they use on Windows scopes?

Both my LT264 and my LT574M do have a GPU (C&T 65545 I think, very old) and from what I read it's used for display processing as well.

Mr Nutts · « **Reply #54 on:** July 06, 2018, 12:24:24 pm »

Has anyone seen this?

https://youtu.be/Mv2QNxv3C2k

Holy cow, 1.5 million waveforms per second!

This lecroy scope appears to be one of the newer ones with Windows, in the description it says 2005. This is still 13 years old.

But text says CPU and memory have been upgraded so this is cheating

He's showing the results on the same scope I have (Infinum 8064), but his appears to be in much better condition (mine looks rather beaten down).

darkstar49 · « **Reply #55 on:** July 06, 2018, 01:57:02 pm »

Quote from: Mr Nutts on July 06, 2018, 12:24:24 pm

But text says CPU and memory have been upgraded so this is cheating

The Xi can't be upgraded much... Lecroy had opted for a motherboard with a PC/104-PLUS connector, tailor-made for them by BCM. These boards ever existed only with 855 and 945GME chipset, and were equipped with BGA CPU's for the 855 version, PGA for the 945.
The Xi's were all delivered with Celereon M320 CPU's, the Xi-A's with Celeron's M440... (but the 945 board supported up to Core 2 Duo CPU's), anyway, lightyears away from modern CPU's...

Mr Nutts · « **Reply #56 on:** July 06, 2018, 02:02:41 pm »

Just imagine how fast such a scope could be with a modern CPU

darkstar49 · « **Reply #57 on:** July 06, 2018, 04:52:51 pm »

There's still the PCI bus which is a serious limitation at typical sampling rates... newer design, entirely based on PCIe, are orders of magnitude faster (from HDO series onwards...)

David Hess · « **Reply #58 on:** July 06, 2018, 07:58:18 pm »

Quote from: bson on July 06, 2018, 12:16:55 am

Any x86/x64 desktop processor is going to have an L1 cache bandwidth that runs circles around FPGA internal memory blocks. I mean, as in the cheapest desktop or even mobile x64 processor is going to outperform the ballsiest FPGA out there on internal memory bandwidth. This is not surprising really since the desktop processor isn't burdened with a general-purpose signal routing fabric.

The cache in a CPU and memory block in an FPGA essentially have the same latency; there is nothing magical about a SRAM cell on a CPU which makes it faster than a SRAM cell on an FPGA. The difference is that a CPU has an instruction pipeline load-to-use latency greater than 1 allowing pipelining of the cache accesses to increase bandwidth.

So in a practical design it makes no difference; the FPGA can replicate the memory and logic blocks for increased parallelism to equal the thread and instruction level parallelism that an out of order CPU design with massive pipelined cache bandwidth has.

Quote from: Mr Nutts on July 06, 2018, 12:12:58 pm

Quote from: David Hess on July 05, 2018, 10:49:07 pm
That makes sense because the LeCroy has much greater blind time if it cannot trigger on the signal; at least I assume that is the case based on the LeCroy documentation that I checked. These two styles of DSO have different strengths and weaknesses so they must be used in different ways to make the most of their capabilities.

The update rate of my LT574M is in the region of a few hundred to a thousand updates per second. But if I go to RIS mode it jumps up to 20'000 updates per second which is a bit weird. Maybe I should try to capture a rare anomaly in RIS mode?

I have no idea how that compares with contemporary scopes from the same era.

Those kind of update rates are typical now for low cost designs including LeCroy's lower end models and the Rigol DS1000Z series and they all work the same way. Basic decimation is supported in hardware for things like peak detection and high resolution and then the processor is fast enough to process and display a few thousand acquisition records per second under optimal conditions.

The much faster "DPO" style DSOs instead produce a histogram of what should be displayed and that is processed to produce a display.

RIS (random interleaved sampling) is just another name for random ETS (random equivalent time sampling) and there is no particular reason it should have a faster waveform acquisition rate. My guess is that your LeCroy LT574M is capturing multiple RIS acquisitions into the acquisition record before transferring it for processing which is a nice way to improve performance. Most other DSOs transfer the random ETS acquisitions as they are captured and combine them using the processor which is slower unless they are operating like a DPO style DSO.

Quote from: Mr Nutts on July 06, 2018, 12:24:24 pm

Has anyone seen this?

Holy cow, 1.5 million waveforms per second!

It looks like 1.2 million to me.

I think that is with a 50 kSample record length at 5 nanoseconds/division so 100 divisions or 500 nanoseconds (only 50 nanoseconds or 10 divisions are displayed) which could support up to 2 million acquisitions per second so it is blind 40% of the time which is consistent with the trigger out duty cycle displayed on the Agilent. That is excellent performance.

I would like to know more about the operating mode though. Is that with segmented memory or something else?

Mr Nutts · « **Reply #59 on:** July 06, 2018, 08:49:06 pm »

The average is 1.19 million but the max is 1.5 million (and everyone seems to care for max values only).

As to what mode this is I think I'll send him a message on the other forum

I also think I'll see what update rate I can squeeze out of my LT574M

Mr Nutts · « **Reply #60 on:** July 09, 2018, 08:46:46 pm »

I got a reply from Wurstunhund:

Regarding the mode of the scope in my YT video, the WR64Xi was in 'WaveStream' mode. WaveStream is a mode where the scope shows an analog-style persistence screen, a bit like Tek's DPO mode but without its drawbacks. In WaveStream mode, the scope gives priority to the update rate at the cost of processing, although it runs at full sample rate and (selected) memory depth, and you can use measurements as math as on a normal acquisition.

WaveStream is pretty much LeCroy's nod to Agilent who by then started to make the waveform rate a marketing argument. It also made life a bit easier for those that came from analog scopes I guess.

WaveStream is an optional mode and doesn't replace the standard analog persistence, color persistence and 3D persistence modes that were already in previous LeCroy X-Stream scopes. While the latter modes are highly configurable, WaveStream is pretty simple in that it has it's own single knob (pushable encoder) which conrols on/off (by pushing) and brightness (by rotating) and that's it, like on an analog scope.

My WR64Xi was 'doped' with a faster CPU with more and faster L2 cache and more RAM as stated in the video's description. The frequency counter on the Agilent scope shows a peak rate of 1.51 million waveforms per second which is incredible, especially for a scope that was made in 2005 (I guess that puts the >1M waveforms per second claims from Agilent/Keysight and R&S for their RTO in a new perspective). It's also a lot more than even the fastest analog scopes could do (which top out at around 750k wfms/s).

I never checked the update rate with the standard piss-poor Celeron the scope came with. The specified update rate in WaveStream mode was specified much lower (I think the original spec was 8000 wfms/s if I remember right) but that wasn't a realistic number. It shows that LeCroy wasn't really interested in update rates.

I long sold the scope (and at least partially regret it, although the WRXi wasn't the most reliable scope and of pretty poor build quality) so I can't shoot the video in better quality unfortunately, which means this is all there is. I only shot it to show it a friend, but back then there was Someone in the forum who claimed that a normal x86 architecture can't possibly reach high update rates so I put it on YT. Appears he was wrong, like on so many other things. Ah, the old times!

Hope that answers your question.

Good luck with the EEV forum. Just be careful, you might get banned if you mention LeCroy too often :DD

Berni · « **Reply #61 on:** July 10, 2018, 05:03:32 am »

Interesting stuff.

I wonder how this high update rate mode works under the hood as that is quite the boost to update rates from the standard mode.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC (Read 13118 times)

David Hess

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

bson

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

Berni

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

Mr Nutts

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

Mr Nutts

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

darkstar49

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

Mr Nutts

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

darkstar49

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

David Hess

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

Mr Nutts

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

Mr Nutts

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

Berni

Re: LeCroy Wavepro HD 8GHz Bw - / 20GS/s 12bit ADC

Share me