Author Topic: Reverse engineering FNIRSI-5012H  (Read 108743 times)

0 Members and 1 Guest are viewing this topic.

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37661
  • Country: au
    • EEVblog
Re: Reverse engineering FNIRSI-5012H
« Reply #50 on: November 09, 2019, 08:30:40 am »
It appears the X1 x10 setting is just a software option and not a hardware attenuator that effects the bandwidth.
I got about 104MHz 3dB bandwidth, but the probe compensation changes with voltage range, and on the lowest range you can't even compensate the probe, it's out of range.
Sample rate measured at 120MHz or so differential, but it drops to a single ended clock at lower timebase settings.
There is a big non-linearity in the response at 42MHz, and the signal is wonky over about 20MHz, so I put the proepr usable bandwidth around 20MHz.
And the displayed sinusoidal signal goes nuts at precisely 100MHz, fine at 99MHz and 101MHz, so there is some really funky interleaved sampling software artifacts happening.
The deeper I look the more issues I find.
« Last Edit: November 09, 2019, 08:36:08 am by EEVblog »
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #51 on: November 09, 2019, 08:32:03 am »
Yes, there is nothing in the hardware to do with 1X/10X. It is a purely software multiplier.
Alex
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #52 on: November 09, 2019, 08:36:37 am »
Sample rate measured at 120MHz or so differential, but it drops to a single ended clock at lower timebase settings.
It is 125 MHz and it is not differential. It is using either one or two ADC channels.

And its default firmware is crap, there is no question about it.
Alex
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #53 on: November 09, 2019, 09:22:09 am »
More testing on the GPIO performance. I just tried to read the input values as fast as I can from the code:
Code: [Select]
  uint32_t a0 = GPIOD->ISTAT;
  uint32_t a1 = GPIOD->ISTAT;
  uint32_t a2 = GPIOD->ISTAT;
  uint32_t a3 = GPIOD->ISTAT;
  uint32_t a4 = GPIOD->ISTAT;
  uint32_t a5 = GPIOD->ISTAT;
  uint32_t a6 = GPIOD->ISTAT;
  uint32_t a7 = GPIOD->ISTAT;

All those things went into registers:
Code: [Select]
800025a: f8d3 9010 ldr.w r9, [r3, #16]
 800025e: f8d3 8010 ldr.w r8, [r3, #16]
 8000262: f8d3 e010 ldr.w lr, [r3, #16]
 8000266: f8d3 c010 ldr.w ip, [r3, #16]
 800026a: 691f      ldr r7, [r3, #16]
 800026c: 691d      ldr r5, [r3, #16]
 800026e: 691c      ldr r4, [r3, #16]
 8000270: 6918      ldr r0, [r3, #16]

When clock is 4 MHz (system clock divided by 2), I get the following pattern: "03 04 04 05 06 06 07 07".

So GPIO itself can definitely sample every value.

EDIT: Another test - run the DMA in an untriggered M2M mode. So it just reads GPIO register as fast as it can. It already misses bytes:
Code: [Select]
17 19 1a 1b 1d 1e 1f 20 22 23 25 26 27 28 29 2b 2c 2d 2e 2f 31 32 34 35 36 37 38 3a 3b 3c
So the bottleneck is DMA itself.
« Last Edit: November 09, 2019, 09:41:43 am by ataradov »
Alex
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3233
  • Country: gb
Re: Reverse engineering FNIRSI-5012H
« Reply #54 on: November 09, 2019, 12:08:15 pm »
Therefore GD32Fxxx is so fast.
We'll see how fast it really is. Even running from SRAM, there is still a possibility to screw up buses.

It has 64K of TCM that may end up being the fastest option.

I don't think the DMA can access the TCM, this is the case on the STM32 at least.
 

Offline splin

  • Frequent Contributor
  • **
  • Posts: 999
  • Country: gb
Re: Reverse engineering FNIRSI-5012H
« Reply #55 on: November 09, 2019, 01:44:07 pm »
So the bottleneck is DMA itself.

Have you enabled the DMA FIFO to allow it to pack the 16 bit source into 32 bit writes?  See section 1.1.9 in:

https://www.st.com/resource/en/application_note/dm00046011.pdf

[EDIT] Otherwise, if the DMA performs 16 bit writes to (32 bit) memory it will cost an additional cycle as it will need to perform a read-modify-write operation
« Last Edit: November 09, 2019, 02:00:13 pm by splin »
 

Offline dave j

  • Regular Contributor
  • *
  • Posts: 127
  • Country: gb
Re: Reverse engineering FNIRSI-5012H
« Reply #56 on: November 09, 2019, 01:58:38 pm »
I've spotted a performance improvement to that code I posted.

The code of the form:
Code: [Select]
usub8 temp, samples, triggers
sel temp, temp, zeros
cbnz temp, Found

Can be replaced with:
Code: [Select]
uqsub8 temp, samples, triggers
cbnz temp, Found

This saves 8 instruction each time round the loop or 4096 while processing a 16K buffer.


Separately, how often are we likely to have a pulse that is only one sample long? I'm wondering if, in interleaved 250MSPS mode, it's practical to just check the unreversed samples (i.e. every other one) during the loop and when we find a potential match in a word to look at the reversed samples when we check to see which sample in the word is the real trigger. Obviously, there's a possibility of a very short pulse being missed but, given the sample rate and effective bandwidth of the scope, it it going to be a real concern?
I'm not David L Jones. Apparently I actually do have to point this out.
 
The following users thanked this post: GeorgeOfTheJungle

Offline dave j

  • Regular Contributor
  • *
  • Posts: 127
  • Country: gb
Re: Reverse engineering FNIRSI-5012H
« Reply #57 on: November 09, 2019, 02:32:54 pm »
OK. I've tried it out now.

I can process a 16K buffer in 14,891 CPU clock ticks - leaving about 1400 to check for key presses, etc. This means if people are willing to accept a "may not detect pulses shorter than 8ns" restriction, there is the potential for supporting 250MSPS.


Edit: After some optimisation this now executes in 13,613 CPU clock cycles with nearly 17% of CPU cycles left for other tasks.
« Last Edit: November 09, 2019, 07:02:17 pm by dave j »
I'm not David L Jones. Apparently I actually do have to point this out.
 
The following users thanked this post: GeorgeOfTheJungle, mikerj

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #58 on: November 09, 2019, 07:11:15 pm »
I don't think the DMA can access the TCM, this is the case on the STM32 at least.
The idea was to execute from TCM and unload the main AHB bus. But in this device TCM is data-only. And as test show AHB bus is really multi-layer and capable of supporting instruction fetch and data transfer at the dame time, so that's not a problem anymore.
Alex
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #59 on: November 09, 2019, 07:15:55 pm »
Have you enabled the DMA FIFO to allow it to pack the 16 bit source into 32 bit writes?  See section 1.1.9 in:
I tried FIFO. It makes things worse. There are a few values that get read with the same pattern, but then there are 4-5 values that are missing entirely. I assume that's where it turns around and drains the FIFO in a big burst transfer. One thing I don't understand is why it is stopping the receive part. In theory DMA has two masters and they are on different buses. I don't understand why they can't work at the same time.

[EDIT] Otherwise, if the DMA performs 16 bit writes to (32 bit) memory it will cost an additional cycle as it will need to perform a read-modify-write operation
Why? The buses support 8-, 16- and 32-bit transfers. Why would it do RMW? It is also not the case in practice. Changing both transfers to 32-bit does not improve the situation.
Alex
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #60 on: November 09, 2019, 07:26:17 pm »
Separately, how often are we likely to have a pulse that is only one sample long? I'm wondering if, in interleaved 250MSPS mode, it's practical to just check the unreversed samples (i.e. every other one) during the loop and when we find a potential match in a word to look at the reversed samples when we check to see which sample in the word is the real trigger. Obviously, there's a possibility of a very short pulse being missed but, given the sample rate and effective bandwidth of the scope, it it going to be a real concern?
Yes, I was also thinking about doing a more sparse checks. I would not consider missing a one sample pulse to be a real issue.

I want to spend some time trying to get F_CPU/2 sample rate on a single channel working reliably before moving on to timing tests. At the moment I'm thinking that hardware is not actually capable of capturing 250 MSPS stream and they either just ignore that, or do something in the software. There is no reliable triggering at this sample rate in their software, so it may show whatever it wants and nobody would notice. I will experiment with the working device to see if I can make out something.
Alex
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #61 on: November 10, 2019, 02:37:27 am »
Ok, I'm giving up for now on trying to make it work with 125 MSPS/channel. The best my device can do reliably at this time is SYS_CLK/4 or 62.5 MSPS.

I also did try code by dave j. I tried the modified version. I only ran it on arrays of zeros to get the longest run time. I have not tested that the code actually works, just measured its run time. And all my experiments were done on the system running at 8 MHz. But again, all things measured in clock cycles should scale proportionally and it is much easier to debug with the slow code.

There is a noticeable performance impact from the DMA. The code running in a simple loop on the array of 32K samples is executed in 6.79 ms. The same code running in the DMA IRQ handler while DMA is receiving another buffer is taking 7.69 ms to run.

There is no contention if the buffers are split between the two SRAMs. But in this case each buffer can only be 16 K, since the second SRAM is only 16 K.

But even without splitting the buffers, there is still some room for the main code to execute. This was not the case with the  original code that had 'sel' instructions. That one was too slow.

This means that we can capture two channels at 62.5 MSPS and process each sample for triggering for a total of 125 MSPS scope.

If someone finds a way to actually make the DMA run with 125 MHz clock, it would be nice.
Alex
 

Offline splin

  • Frequent Contributor
  • **
  • Posts: 999
  • Country: gb
Re: Reverse engineering FNIRSI-5012H
« Reply #62 on: November 10, 2019, 05:31:47 am »
Have you enabled the DMA FIFO to allow it to pack the 16 bit source into 32 bit writes?  See section 1.1.9 in:
I tried FIFO. It makes things worse. There are a few values that get read with the same pattern, but then there are 4-5 values that are missing entirely. I assume that's where it turns around and drains the FIFO in a big burst transfer. One thing I don't understand is why it is stopping the receive part. In theory DMA has two masters and they are on different buses. I don't understand why they can't work at the same time.

That's not encouraging. You may well be right that they don't actually achieve even 250MSPs sampling reliably but it is possible they have better access to the silicon developers which allowed them to configure the device appropriately, perhaps even using undocumented features. I guess that only a lot of testing of the scope will be needed to reveal the truth. Have you tried stopping the CPU using WFI to guarantee there is no contention for the memory/buses/AHB matrix?

Quote
[EDIT] Otherwise, if the DMA performs 16 bit writes to (32 bit) memory it will cost an additional cycle as it will need to perform a read-modify-write operation
Why? The buses support 8-, 16- and 32-bit transfers. Why would it do RMW? It is also not the case in practice. Changing both transfers to 32-bit does not improve the situation.

Sorry, brainfart. I realised not long after posting that it was rubbish but had to go out before I could correct it. I put it down to some badly remembered notes on store instruction timings which can incur extra cycles, but on checking it was referring to unaligned half word accesses etc.
 
The following users thanked this post: GeorgeOfTheJungle

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #63 on: November 10, 2019, 05:49:25 am »
You may well be right that they don't actually achieve even 250MSPs sampling reliably
I'm starting to think this is the case.

but it is possible they have better access to the silicon developers which allowed them to configure the device appropriately, perhaps even using undocumented features.
It is possible, but not likely.

Have you tried stopping the CPU using WFI to guarantee there is no contention for the memory/buses/AHB matrix?
I tried, but the CPU does not introduce any overhead. In my tests it is waiting for the interrupt flag in a loop without accessing SRAM. And the code is in a totally separate section.

I did more tests on the FIFO stuff, and I don' think it actually loses any more data than without FIFO, but it also does not make it any better.

It looks like things get lost on a way from the GPIO register to the DMA controller, before they get a chance to be put into FIFO or sent over the other master.
Alex
 

Offline EEVblog

  • Administrator
  • *****
  • Posts: 37661
  • Country: au
    • EEVblog
Re: Reverse engineering FNIRSI-5012H
« Reply #64 on: November 11, 2019, 03:14:01 am »
 
The following users thanked this post: GeorgeOfTheJungle, dave528

Offline dave j

  • Regular Contributor
  • *
  • Posts: 127
  • Country: gb
Re: Reverse engineering FNIRSI-5012H
« Reply #65 on: November 11, 2019, 11:49:18 am »
I did more tests on the FIFO stuff, and I don' think it actually loses any more data than without FIFO, but it also does not make it any better.

It looks like things get lost on a way from the GPIO register to the DMA controller, before they get a chance to be put into FIFO or sent over the other master.

When testing the with the FIFO did you just test doing transfers when the FIFO was full? I'm wondering if doing the transfers when the FIFO is half full might allow it to continue receiving samples whilst the FIFO to memory transfer is going on. It would also be worth looking at whether 4 x 16 bit transfers in burst mode are faster than 2 x 32 single transactions - there'd be more memory writes but less DMA transaction latency.

I've also discovered a bug in my trigger detect code. I got which bytes are reversed wrong. I've attached a fixed version.
I'm not David L Jones. Apparently I actually do have to point this out.
 
The following users thanked this post: GeorgeOfTheJungle

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #66 on: November 11, 2019, 05:26:12 pm »
When testing the with the FIFO did you just test doing transfers when the FIFO was full? I'm wondering if doing the transfers when the FIFO is half full might allow it to continue receiving samples whilst the FIFO to memory transfer is going on. It would also be worth looking at whether 4 x 16 bit transfers in burst mode are faster than 2 x 32 single transactions - there'd be more memory writes but less DMA transaction latency.
I tried multiple configurations based on the same thinking. But they all yield the same result. I really don't think it has anything to do with the bandwidth limits. It looks like DMA just can't capture the data with triggers going this fast.

I also restructured the program to move all the regular variables to the TCM. So now I have full 128 KB for the sample memory. The plan is to divide that into 16 KB chunks and implement a circular buffer. I tested parts of this and it seems to work just fine. This even allows for capture long after the trigger event, all you need to do is just count the number of 16 KB buffers. And for the last buffer I can stop on the exact sample in the middle of the buffer (+/- a few samples). For the last buffer I just block in the DMA ISR and manually poll the counter.

The next thing to figure out is how to periodically let the MCU access the buffer for the auto mode. In this mode minor loss of trigger is not a big deal, of course, so it should not be a big deal to stop the whole thing and update the display. Switching to rolling mode at high time/div settings would be interesting too.

I will do more experiments later on. For now I switched gears and started working on the UI and navigating the waveform with artificially generated data. It is interesting how you don't think about a lot of stuff when using scopes and the UI is intuitive. But when you have to make an actual implementation, so many questions come up. And you start to see the differences between different vendors.
Alex
 
The following users thanked this post: sotos, mikerj, paulc, jealcuna

Offline dave j

  • Regular Contributor
  • *
  • Posts: 127
  • Country: gb
Re: Reverse engineering FNIRSI-5012H
« Reply #67 on: November 11, 2019, 08:13:18 pm »
I tried multiple configurations based on the same thinking. But they all yield the same result. I really don't think it has anything to do with the bandwidth limits. It looks like DMA just can't capture the data with triggers going this fast.

I was thinking of capturing to separate buffers to avoid the memory contention. The only check the unreversed samples version of the trigger detection code would be able to process samples quickly enough but not if it's fighting for memory bandwidth with the DMA. It would limit total sample size to 32K though.

Since 125MSPS with 128K samples is likely useful to more people than 250MSPS with 32K samples I think focusing on that is the way to go. Writing the whole scope firmware from scratch is a big enough task in itself.
I'm not David L Jones. Apparently I actually do have to point this out.
 

Offline tmf

  • Contributor
  • Posts: 45
  • Country: ca
Re: Reverse engineering FNIRSI-5012H
« Reply #68 on: November 11, 2019, 08:47:15 pm »
I really like this $70 scope... seems to me it could be sold even cheaper? The BOM cost was very low!

Man of many hats :-)

 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #69 on: November 11, 2019, 09:55:46 pm »
There is no memory contention that affects the ability to capture at 250 MSPS. My code at the moment uses TCM for variables, so the only thing that writes into SRAM is DMA. I tried disabling the trigger searching code. DMA still fails to capture all the samples. And for 125 MSPS trigger search is still plenty fast even with DMA going in the background.

Since 125MSPS with 128K samples is likely useful to more people than 250MSPS with 32K samples I think focusing on that is the way to go. Writing the whole scope firmware from scratch is a big enough task in itself.
I was thinking of skipping the reversal and checking every other sample for 250 MSPS. The code still would have to look at all the samples, but reversing code takes a significant amount of time, so just removing this would be a huge boost. There would still be a need to mask out every other sample for SIMD instructions. I have not thought about it a lot.

But yeah, I'm 100% focused on making 125 MSPS version first. Especially given potentially poor analog front-end anyway.
Alex
 
The following users thanked this post: paulc

Offline dave j

  • Regular Contributor
  • *
  • Posts: 127
  • Country: gb
Re: Reverse engineering FNIRSI-5012H
« Reply #70 on: November 12, 2019, 10:34:44 am »
I was thinking of skipping the reversal and checking every other sample for 250 MSPS. The code still would have to look at all the samples, but reversing code takes a significant amount of time, so just removing this would be a huge boost. There would still be a need to mask out every other sample for SIMD instructions. I have not thought about it a lot.

It's trivial. You take a copy of the triggers to use inside the loop and set the values for the samples you don't want to 0xff (or zero for a falling trigger). One thing to be aware of is that the earliest sample in a word is one of those tested and the sample before it is the last sample in the previous word. Additionally, if you detect a trigger on the first sample in a buffer, you have to check the last sample in the previous buffer. Easy enough to code and, because it's not in the loop, it doesn't have any performance implications.
I'm not David L Jones. Apparently I actually do have to point this out.
 
The following users thanked this post: GeorgeOfTheJungle, paulc

Offline mg3100

  • Newbie
  • Posts: 9
  • Country: nl
Re: Reverse engineering FNIRSI-5012H
« Reply #71 on: November 12, 2019, 09:32:37 pm »
Hi,

I bought the 5012H for more or less the same reasons that Alex did, and unfortunately I also still am unable to unlock the device.
I tried using the boot0/boot1 pins to get into the bootloader but that does't work. Boot0 had to be lifted from the board because
it was hardwired to gnd. Boot1 was NC so that was no problem and so was USART0_RX. USART0_TX is assigned to TIMER0_CH1 but
since that is an input on ad9288 I assume it doesn't matter.

So Alex (and others) please keep up the good work
 

Offline cliffyk

  • Frequent Contributor
  • **
  • Posts: 358
  • Country: us
    • PaladinMicro
Re: Reverse engineering FNIRSI-5012H
« Reply #72 on: November 13, 2019, 12:49:40 am »
There is no memory contention that affects the ability to capture at 250 MSPS. My code at the moment uses TCM for variables, so the only thing that writes into SRAM is DMA. I tried disabling the trigger searching code. DMA still fails to capture all the samples. And for 125 MSPS trigger search is still plenty fast even with DMA going in the background.

Since 125MSPS with 128K samples is likely useful to more people than 250MSPS with 32K samples I think focusing on that is the way to go. Writing the whole scope firmware from scratch is a big enough task in itself.
I was thinking of skipping the reversal and checking every other sample for 250 MSPS. The code still would have to look at all the samples, but reversing code takes a significant amount of time, so just removing this would be a huge boost. There would still be a need to mask out every other sample for SIMD instructions. I have not thought about it a lot.

But yeah, I'm 100% focused on making 125 MSPS version first. Especially given potentially poor analog front-end anyway.

Sir,

I admire both your knowledge of these things, and also your tenacity--you remind me of...me 30 years ago--BRAVO!
-cliff knight-

paladinmicro.com
 

Online ataradovTopic starter

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: Reverse engineering FNIRSI-5012H
« Reply #73 on: November 18, 2019, 01:01:32 am »
Here is approximate view of the UI I have at this time. The waveform is simulated. Basically what I'm doing is simulating a single capture and debugging the navigation of the captured waveform. After that works I will hook it up to the actual ADC.

The battery meter took quite some time to implement.

AD/DC and trigger leading/falling edge symbols will be replaced by proper images later.
Alex
 
The following users thanked this post: all_repair, perdrix, edavid, Kean, neil555, gnavigator1007, jhpadjustable

Offline mg3100

  • Newbie
  • Posts: 9
  • Country: nl
Re: Reverse engineering FNIRSI-5012H
« Reply #74 on: November 19, 2019, 04:27:11 pm »
Nice job Alex!

I gave up on unlocking the device. I also ordered a few of them from lcsc.
Now I just have to wait for them to arrive with dhl.

For the time being I'm still using my modified DSO138, so once we have a working 5012H  it is a nice upgrade for me.
Even if we only reach like 25msps , that's more then enough for me.
I am sure we can use one of the unused pins to create some external trigger option, maybe even add a second channel if we modify the ad9288 circuit.
The best improvement compared to the DSO138 is the amount of buttons the 5012H has. Can't wait to custimize the button functions the way I want them.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf