Author Topic: Comparing performance of genuine vs cloned programmers for Intel/Altera FPGAs  (Read 1820 times)

0 Members and 1 Guest are viewing this topic.

Offline mixtTopic starter

  • Contributor
  • Posts: 11
  • Country: ca
[Updated Feb 8, 2024]

Hi guys. I'm sure people have talked about this stuff before, but I thought I'd give my 2 cents, aimed at helping more advanced hobbyists make decisions. Nothing too crazy but you get some teardowns and speed tests between 3 programmers and 3 FPGAs.

Let's look at what we're comparing:

1. Genuine Intel/Altera USB-Blaster II, manufactured mid-2020.
Mouser price: $225 USD

Components:
EPM570M100C4N CPLD, 440 Macrocells, 304MHz max clock, BGA-100 package, $42 USD in qty
CY7C68013A USB2 Controller @480Mbps (High-Speed)
LV028A Dual Differential LVDS Receiver
LV27A x2 Dual Differential LVDS Drivers
(plus reciprocal RX/TX chips on third heatshrunk PCB with JTAG connector)
3x PCB boards

2. Cloned USB Blaster from WaveShare.
Online price: $25 USD

Components:
FTDI245RL USB2 to Parallel FIFO IC @12Mbps (Full-Speed)
EPM3064ATC4410N CPLD, 64 Macrocells, $1-5 USD in qty
244 buffer for outputs
Single PCB

3. Cloned generic USB Blaster.
Online price: $6-12 USD, sometimes for two.

Components:
Generic SOP-16 MCU up to 32MHz clock, CH552G and similar w/built in USB2 Controller @12Mbps (Full-Speed), $0.35 USD in qty
No buffer or other logic
Single PCB

Programming operation testing was done with 3 FPGAs:
EP4CE10 (Cyclone IV), 10k LEs, with a design that takes up 100% of FPGA resources
5CEFA5 (Cyclone V), 77k LEs / 29k ALMs, with a design that takes up 95% of FPGA resources
[Feb 8, 2024 Update]
10M08 (MAX10), 8k LEs, with a design that takes up 99% of FPGA resources

Testing conditions:
Platform for testing is Linux. Kernel is 6.5.0-14-generic. Quartus is 23.1std.0 Build 991, Lite Edition. USB controller is Fresco FL1100. All USB cables are short and high quality. Programming was done using quartus_pgm binary via command line. Elapsed time granularity is 1 sec. This is a fast desktop machine with a lot of RAM and test is not CPU bottlenecked. Tests were performed 5 times each, with lowest value chosen for result.

Note:
WaveShare programmer has problems with Linux, but not Windows. In Linux, its operation is intermittent with random programming failures. This seems to be a clocking related problem, and it has been reported many times elsewhere for various other cloned programmers. I re-tried it enough times to work for most tests, but several tests had to be done in Windows, since programming in Linux resulted in repeat failures.

Test 1 - Program SOF file to Cyclone IV:
1. Genuine USB-Blaster II: 0 sec elapsed time (< 1 sec)
2. WaveShare USB-Blaster: 1 sec elapsed time
3. MCU based USB-Blaster: 3 sec elapsed time

Test 2 - Program SOF file to Cyclone V:
1. Genuine USB-Blaster II: 2 sec elapsed time
2. WaveShare USB-Blaster: 7 sec elapsed time
3. MCU based USB-Blaster: 32 sec elapsed time

Test 3 - Program JIC file to flash (EPCS16) via Cyclone IV:
1. Genuine USB-Blaster II: 5 sec elapsed time
2. WaveShare USB-Blaster: 7 sec elapsed time
3. MCU based USB-Blaster: 12 sec elapsed time

Test 4 - Program JIC file to flash (N25Q128A) via Cyclone V:
1. Genuine USB-Blaster II: 1m 17s elapsed time
2. WaveShare USB-Blaster: 1m 30s (*Windows result)
3. MCU based USB-Blaster: 2m 42s elapsed time

[Feb 8, 2024 Update]
Test 5 - Program SOF file to MAX10:
1. Genuine USB-Blaster II: 1 sec
2. WaveShare USB-Blaster: 1 sec
3. MCU based USB-Blaster: 3 sec

[Feb 8, 2024 Update]
Test 6 - Program POF file to MAX10:
1. Genuine USB-Blaster II: 9 sec
2. WaveShare USB-Blaster: 14 sec (*Windows result)
3. MCU based USB-Blaster: 16 sec

[Feb 8, 2024 Update]
Windows notes:
I had a chance to re-test a few things under Windows and got different results for some of the devices, where sometimes, the Genuine USB-Blaster II was much slower than the WaveShare USB-Blaster. In many cases, for some inexplicable reason, programming these chips in Windows was slower than in Linux. USB device speeds are easy to confirm in Linux but quite troublesome in Windows so I didn't investigate further. For instance though, for Test 4, programming the JIC file to Cyclone V took 5m 6s on Genuine USB-Blaster II, 1m 30s on WaveShare USB-Blaster, and 5m 33s on MCU based USB-Blaster. I repeated this operation several times to confirm. Both #1 and #3 blasters were substantially slower in Windows, which could be caused by some unknown issue. I wanted to test some larger FPGAs but I don't think there is a point, since we already get the picture. As a final note, the virtual memory usage metric in quartus_pgm was 15x higher in Windows than in Linux (4500MB vs 300MB). No idea why or if it matters, but just mentioning for completeness.

Conclusions:
If you're working with smaller chips, cloned/fake programmers can do the job fine. Hard to beat for the price. If you're on Linux, genuine programmers are the way to go. Save yourself a lot of headaches. If you're working with larger chips where you need to re-synthesize and test things frequently, use SignalTap logic analyzer etc, you're far better off with genuine programmers as well. The time savings would add up.

The genuine programmer is very well built internally, ensures far better signal integrity to the programming header, and lets you reduce JTAG clock frequency if needed (default is 24MHz). In my view it's not particularly overpriced, however the original USB-Blaster I certainly seems to be overpriced at this point. Not sure why people still buy it over the newer one.

Please note that in the photos, the WaveShare programmer has an extra capacitor (my effort to ensure that the programming failures weren't caused by power issues). The LEDs are not original either. Other programmers are as they arrived. The MCU based programmer has its chip label erased, however I have other similar programmers with identical exterior casing, and slightly different PCB layout that have that chip labeled as CH552G. Your guess is as good as mine whether they all use the same chip or not.

Hope this is all helpful to some of you! Back to work for me.
« Last Edit: February 08, 2024, 11:59:55 pm by mixt »
 
The following users thanked this post: oPossum, bingo600

Offline berke

  • Frequent Contributor
  • **
  • Posts: 258
  • Country: fr
  • F4WCO
The Blaster II doesn't seem to be compatible with OpenOCD (couldn't get it to work, and it requires a firmware blob), but the Blaster I seems to be (as it is mentioned in the sources).  That could be one reason why people still buy it.

Are these timings with the SFL loader or using a Nios II bridge?

When programming flash chips I've found that the bottleneck is the chip itself unless you're running at very low JTAG clock rates, typical *25Q128 full chip erase time is ~200 s plus 2 ms write delay per page.

I can confirm that Quartus and USB Blaster II work well under Linux (Debian Bullseye, kernel 6.1).  However you never know if you'll be able to download a Quartus version in five or ten years that will work on your machine.

I haven't done any debugging or tracing of signals, and I think that's where the adapter speed will be most important.

But for configuring Cyclones and flashing attached memories in JTAG mode any adapter can be used.  If needed reduce speed until it works.  I have it working reliably with a disgusting mess of Dupont wires, multiple adapters and logic analyzer probes at 8 MHz.

You just need to go to the Quartus programmer window and from there you can load a JIC and generate SVF or JAM files that will talk to the Intel SFL logic.  Many open source tools can process those files.  An FT2232 chip can be turned into a cheap JTAG probe that way in MPSSE mode, which should be supported right out of the box by OpenOCD.

But yes as with all things, if your time isn't free it's best to stick to vendor tools.
« Last Edit: February 07, 2024, 07:42:17 pm by berke »
 
The following users thanked this post: bingo600, mixt

Offline mixtTopic starter

  • Contributor
  • Posts: 11
  • Country: ca
Thanks for your response. I haven't heard of OpenOCD; very cool project! Thanks for mentioning it.

The timings are indeed with SFL loader. It's what the average user will experience with various dev kits out there.

>*25Q128 full chip erase time is ~200 s plus 2 ms write delay per page.
As you can see, my results are shorter. Full erase + program was 1m 17s with the Blaster II on Linux for each of the 5 test cycles that I did. The JIC file for this was 16.8 megabytes since the design used 95% of 77k LEs on that chip. The flash chip is 128 megabit, so it's bound to be fully or almost fully erased and reprogrammed in my testing. The "enhanced specifications" for that chip from Micron list bulk erase time at 46 sec typical, with the same value being 170 sec typical with "standard specifications", which is more consistent with the numbers you've provided.

Thank you for providing all of this additional information. I updated my post as well to include a MAX10 FPGA, plus some Windows notes.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf