Author Topic: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO  (Read 36080 times)

0 Members and 1 Guest are viewing this topic.

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« on: June 24, 2019, 07:31:32 am »
With 1, 2 or 4GB RAM, for 35, 45 and 55 £ respectively, and

- 1 * Gigabit Ethernet
- 2 * HDMI ports (4k)
- 2 * USB3 ports
- 2 * USB2 ports
- WiFi + Bluetooth + etc as before.
- 1 * USB C for power in (OTG?) Draws ~ 13 Watts: 2.5 Amps @ 5V.

Who knows if all those USBs + the ethernet port are still hanging off a hub from a single USB port like in the old ones? The "schematics" are here: https://www.raspberrypi.org/documentation/hardware/raspberrypi/schematics/README.md but don't help much because they're incomplete "(REDUCED)".

A video:




Edit: Sorry Dave, I read this too late: "Use the Embedded Computer section for single board computers like Raspberry Pi etc"
« Last Edit: November 14, 2019, 05:50:15 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4955
  • Country: si
Re: Raspberry Pi 4
« Reply #1 on: June 24, 2019, 07:57:00 am »
That is very nice!

From the benchmarks it looks like Ethernet now works at a full gigabit as advertised:
https://core-electronics.com.au/tutorials/raspberry-pi-4-vs-3-model-b-performance-benchmark.html

Something that most people glance over tho is the RAM upgrade. The older Pi 2 and 3 used DDR2 memory and the poor memory bandwidth on that made for pretty bad 2D performance of the GPU. The GPU inside of a pi is actually pretty fast, but asking it to just simply plonk down a few large textures makes it gasp for data trough the narrow bus to the RAM. Hopefully this should be less of a problem with the much more modern LPDDR4 memory.

I'm really not a fan of those micro HDMI connectors and i don't think many will need two display outputs so i would have rather seen them keep the single full sized HDMI.
 

Offline dr.diesel

  • Super Contributor
  • ***
  • Posts: 2214
  • Country: us
  • Cramming the magic smoke back in...
Re: Raspberry Pi 4
« Reply #2 on: June 24, 2019, 10:03:08 am »
Although I understand their decisions, i was really hoping for an M.2 version on the P4!

But WooHoo on the 4G of DDR4!
 
The following users thanked this post: Richard Crowley

Online Monkeh

  • Super Contributor
  • ***
  • Posts: 7992
  • Country: gb
Re: Raspberry Pi 4
« Reply #3 on: June 24, 2019, 10:20:01 am »
Who knows if all those USBs + the ethernet port are still hanging off a hub from a single USB port like in the old ones? The "schematics" are here: https://www.raspberrypi.org/documentation/hardware/raspberrypi/schematics/README.md but don't help much because they're incomplete "(REDUCED)".

Their blog post made it quite clear. The USB ports are all (except I think the USB-C as it's OTG) handled by a VLI controller hanging off PCIe, so no more crappy Broadcom USB, and the ethernet is a dedicated PHY for an internal MAC. Much improved, it's beginning to look like a real and usable computer. It's just missing a decent storage option now - a good EMMC would be fine..
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Raspberry Pi 4
« Reply #4 on: June 24, 2019, 10:26:31 am »
I'm really not a fan of those micro HDMI connectors and i don't think many will need two display outputs so i would have rather seen them keep the single full sized HDMI.
I was skeptical of those micro HDMI connectors when I first saw them, but they seem to work pretty well on the numerous cameras which use them. You need to keep them clean, but they seem more robust than they look at first sight.
 

Offline mark03

  • Frequent Contributor
  • **
  • Posts: 711
  • Country: us
Re: Raspberry Pi 4
« Reply #5 on: June 24, 2019, 02:45:51 pm »
I wonder if they will finally transition the supported/approved OS to 64-bit?
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #6 on: June 24, 2019, 04:50:51 pm »
Their blog post made it quite clear. The USB ports are all (except I think the USB-C as it's OTG) handled by a VLI controller hanging off PCIe, so no more crappy Broadcom USB, and the ethernet is a dedicated PHY for an internal MAC. Much improved, it's beginning to look like a real and usable computer. It's just missing a decent storage option now - a good EMMC would be fine..

What blog post, please?

I'd want eMMC and a RTC. And a power manager with lipo charger... Why did they swap the ethernet/usb positions on the PCB? And 13 watts are too many. All my RPis run headless, and they've put TWO HDMIs, cool, thank you very much. Meh. Do'h. Grr. LOL.
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline ehughes

  • Frequent Contributor
  • **
  • Posts: 409
  • Country: us
Re: Raspberry Pi 4
« Reply #7 on: June 24, 2019, 04:58:22 pm »
Quote
Who knows if all those USBs + the ethernet port are still hanging off a hub from a single USB port like in the old ones? The "schematics" are here: https://www.raspberrypi.org/documentation/hardware/raspberrypi/schematics/README.md but don't help much because they're incomplete "(REDUCED)".

RPi is an interesting datapoint for the OSHW zealots.     There has never been full schematics and you can't get a CPU manual (easily).   It has always been the darling of the maker community.

Cheap always wins.  Then comes usability (which it seems people will trade for cheap in the hobby market).     
In the end, no one really cares about "open".





 

Online Monkeh

  • Super Contributor
  • ***
  • Posts: 7992
  • Country: gb
Re: Raspberry Pi 4
« Reply #8 on: June 24, 2019, 05:18:22 pm »
Their blog post made it quite clear. The USB ports are all (except I think the USB-C as it's OTG) handled by a VLI controller hanging off PCIe, so no more crappy Broadcom USB, and the ethernet is a dedicated PHY for an internal MAC. Much improved, it's beginning to look like a real and usable computer. It's just missing a decent storage option now - a good EMMC would be fine..

What blog post, please?

Shockingly, https://www.raspberrypi.org/blog/

Quote
I'd want eMMC and a RTC. And a power manager with lipo charger... Why did they swap the ethernet/usb positions on the PCB? And 13 watts are too many. All my RPis run headless, and they've put TWO HDMIs, cool, thank you very much. Meh. Do'h. Grr. LOL.

RTC would be good. They moved the ports for practicality.
 

Offline mark03

  • Frequent Contributor
  • **
  • Posts: 711
  • Country: us
Re: Raspberry Pi 4
« Reply #9 on: June 24, 2019, 05:29:01 pm »
RPi is an interesting datapoint for the OSHW zealots.     There has never been full schematics and you can't get a CPU manual (easily).   It has always been the darling of the maker community.

Indeed.  But the sad truth is that there aren't the open alternatives there once were.  After TI got out of the high-end applications-processor market the choice has come down to nasty US semi mfgs (Broadcom, Nvidia, ...) and cheap Chinese stuff (Rockchip, Allwinner).  Oddly, I do get the sense sometimes that the Chinese chips are a little more open, but usable documentation is generally lacking.

Maybe RISC-V will save us :-\
 

Offline oPossum

  • Super Contributor
  • ***
  • Posts: 1417
  • Country: us
  • Very dangerous - may attack at any time
Re: Raspberry Pi 4
« Reply #10 on: June 24, 2019, 07:26:54 pm »
The USB ports are all handled by a VLI controller hanging off PCIe, so no more crappy Broadcom USB, and the ethernet is a dedicated PHY for an internal MAC. Much improved, it's beginning to look like a real and usable computer. It's just missing a decent storage option now - a good EMMC would be fine..

Crappy Broadcom USB replaced by crappy VLI USB. Still crappy.

Maybe Pi 5 will have Renesas USB and m.2
 

Online mac.6

  • Regular Contributor
  • *
  • Posts: 225
  • Country: fr
Re: Raspberry Pi 4
« Reply #11 on: June 24, 2019, 07:34:46 pm »
RPi is an interesting datapoint for the OSHW zealots.     There has never been full schematics and you can't get a CPU manual (easily).   It has always been the darling of the maker community.

Indeed.  But the sad truth is that there aren't the open alternatives there once were.  After TI got out of the high-end applications-processor market the choice has come down to nasty US semi mfgs (Broadcom, Nvidia, ...) and cheap Chinese stuff (Rockchip, Allwinner).  Oddly, I do get the sense sometimes that the Chinese chips are a little more open, but usable documentation is generally lacking.

Maybe RISC-V will save us :-\
You forgot i.MX family, no problem getting RM and git trees/mainline integration. Altough there are still binary blob GPU/VPU
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: Raspberry Pi 4
« Reply #12 on: June 24, 2019, 07:58:21 pm »
Quote
Who knows if all those USBs + the ethernet port are still hanging off a hub from a single USB port like in the old ones? The "schematics" are here: https://www.raspberrypi.org/documentation/hardware/raspberrypi/schematics/README.md but don't help much because they're incomplete "(REDUCED)".

RPi is an interesting datapoint for the OSHW zealots.     There has never been full schematics and you can't get a CPU manual (easily).   It has always been the darling of the maker community.

Cheap always wins.  Then comes usability (which it seems people will trade for cheap in the hobby market).     
In the end, no one really cares about "open".

Open is nice, but in the end I don't really care if it's completely open so long as it's open enough that I can make it do what I need. The RPi is perfect for all sorts of projects, I don't really care that it isn't fully open source because it's open enough and documented enough that I can tweak nearly anything I'd ever need to tweak. I'm not going to build my own variant anyway and frankly I think the fact that the community isn't fragmented into 500 different semi-compatible variants is a good thing. It's also already so cheap that I doubt anyone else would be able to under-cut them without serious cutting of corners. I'm happy to not have the market flooded with Pi knockoffs that may or may not work well. If it cost 2-3 times as much as it does then I'd be advocating for a cheaper open source alternative. Open hardware prevents price gouging by allowing competition and so far I haven't seen evidence of gouging with the Pi.
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 9018
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: Raspberry Pi 4
« Reply #13 on: June 24, 2019, 08:51:11 pm »
Their blog post made it quite clear. The USB ports are all (except I think the USB-C as it's OTG) handled by a VLI controller hanging off PCIe, so no more crappy Broadcom USB...
If it can do USB device and host at the same time (any confirmation if that's the case?), it looks like a great alternative to the Beaglebone Black for USB analysis. And either the USB 3.0 or Ethernet can send the traffic (at least up to USB 2.0) to an external storage device without any real problems.
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Online PCB.Wiz

  • Super Contributor
  • ***
  • Posts: 1544
  • Country: au
Re: Raspberry Pi 4
« Reply #14 on: June 25, 2019, 04:16:57 am »
Has anyone seen speed specs for the IO connector ports.
It mentions up to 6 i2c and 6 UARTS, and 5 SPI, but is sparse on if those can run faster than Pi3 ?
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8172
  • Country: fi
Re: Raspberry Pi 4
« Reply #15 on: June 25, 2019, 06:29:09 am »
RPi is an interesting datapoint for the OSHW zealots.

Indeed, it's a very interesting case, it would be worth some historical/cultural research.

Raspberry Pi has absolutely nothing to do with open-source hardware, never did. The Raspi foundation are not even claiming that, it would be ridiculous. Quite the opposite, they use things like DRMs to close their product down. If anything, Raspi is the polar opposite of OSHW.

I do have an impression (but don't have the proof) that they originally did the guerilla marketing for the first model by injecting rumors about a new OSHW computer. This tactic was great, of course; the phrase still lives after a decade, and you don't need to actively lie, at least not using your own name.

Raspberry Pi Foundation has "blood" in their hands, they are partially responsible for destroying the concept of OSHW. Back in maybe 2005'ish, the word did have some meaning. Now "open source hardware" is nothing else but a marketing buzzword completely empty of any meaning, can be applied to any totally closed hardware product.

Basic IO pin mapping is not a schematic, it has to be provided for any embedded device to be useful, of course. So by using terminology like this, they are still actively lying.

Raspi4 is great news, we are offering the possibility of using either Raspi3 or Odroid XU4 in an embedded thing which requires quite some oomph to run its software. The idea was to both prevent vendor-lock-in, and to offer two different performance options at different prices. Now it seems the gap between them almost closes, so both options can offer almost the same performance. The only issue left is the SD card (Odroid has EMMC).

Why is this topic in the microcontroller section? We now have the very specific section for Raspberry Pi exactly.
« Last Edit: June 25, 2019, 06:49:59 am by Siwastaja »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #16 on: June 25, 2019, 06:30:26 am »
RPi is an interesting datapoint for the OSHW zealots.     There has never been full schematics and you can't get a CPU manual (easily).   It has always been the darling of the maker community.

Indeed.  But the sad truth is that there aren't the open alternatives there once were.  After TI got out of the high-end applications-processor market the choice has come down to nasty US semi mfgs (Broadcom, Nvidia, ...) and cheap Chinese stuff (Rockchip, Allwinner).  Oddly, I do get the sense sometimes that the Chinese chips are a little more open, but usable documentation is generally lacking.

Maybe RISC-V will save us :-\

Maybe, but any cheapish RISC-V board you see in the next 12-24 months will be around Pi 3+ or Odroid C2 A53 performance. No one has yet even announced anything at A72 levels.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #17 on: June 25, 2019, 07:11:30 am »
Maybe, but any cheapish RISC-V board you see in the next 12-24 months will be around Pi 3+ or Odroid C2 A53 performance. No one has yet even announced anything at A72 levels.

Since you are in the trade, do you happen to know if some huge players like Huawei or Samsung will roll some wicked powerful Risc V chips in near future?

If they were, they wouldn't tell us :-)

If Huawei needs to replace their current Kirin application processors, no one has RISC-V cores in that class yet. And Kirin is (so far) using purely off the shelf ARM cores, unlike Exynos or SnapDragon or Apple An where companies have developed their own cores in at least some recent cases.

Huawei has developed their own ARMv8.2 core for the new Kunpeng 920 64 core chip.

I've seen some Chinese news sites speculate that as Huawei bought a perpetual ARM Architecture License they may be able to continue developing their own ARMv8.2 (if not later) chips as long as they want, even with bans.

If not, their fastest route to a similar performance RISC-V would be to just swap out the ARM instruction decoder in the Kunpeng for a RISC-V instruction decoder -- a task simple enough that amateurs have been doing this with things such as the LEON SPARC core to make ReonV. Most of the back end register and pipeline stuff could remain unchanged at first -- the biggest essential change would be adding support for RISC-V's "compare and branch" instructions, though a temporary hack could be to expand those to two uops in the decoder.

They could then incrementally strip out all the stuff that RISC-V doesn't use, to save power and area. The condition codes would be the first to go :-)

One difficulty is they'll need some equivalent for NEON, which RISC-V doesn't yet have standardised. They could come up with some custom RISC-V opcodes that hook up to the existing NEON hardware, but long term they'll want to implement the RISC-V V extension. Well, with Huawei's resources "long term" should mean "12 months" or less. I don't know if they had any plans to implement SVE, but if so then making that do RVV instead would not be difficult.
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6460
  • Country: nl
Re: Raspberry Pi 4
« Reply #18 on: June 25, 2019, 08:41:51 am »
Quote from: mark03 link=topic=195324.msg2506812#msg2
Indeed.  But the sad truth is that there aren't the open alternatives there once were.  After TI got out of the high-end applications-processor market the choice has come down to nasty US semi mfgs (Broadcom, Nvidia, ...) and cheap Chinese stuff (Rockchip, Allwinner). 
:o missed that one, so TI is out, does that mean no more possible new beaglebone developments ?
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #19 on: June 25, 2019, 11:46:22 am »
I've seen some Chinese news sites speculate that as Huawei bought a perpetual ARM Architecture License they may be able to continue developing their own ARMv8.2 (if not later) chips as long as they want, even with bans.

What bans? You mean USA bans? I think a rich japanese dude bought ARM, no? In any case, ARM was british, wasn't it? Sorry if I'm being dense :-)
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6460
  • Country: nl
Re: Raspberry Pi 4
« Reply #20 on: June 25, 2019, 03:33:11 pm »
Broadcom and Chinese chips manufacturers are willing to sell chips at silicon cost plus ARM core cost and a diminishing margin, with all peripherals designed in-house to reduce the last fraction of cent on IP licensing.
The one and only time I dealt with Broadcom was a couple of years back.
They were developing a new Arm based chip for a networkswitch and we discussed to add some options so it would be usable for our products as well.
The price they named in the first meetings was droolingly low, with a few 100k quantity it was  about a fourth of what an off the shelf chip from ST would cost in millions quantity and the Broadcom was faster more memory etc. However there were many issues, even with NDA you don't get all the info, you're code is locked to that chip (peripherals API) so you are dependent on them forbthe next product. And after a few additions from our sides their first quote already doubled and we are not talking about big add ons, just some few extra peripherals.
It would not surprise me if they have a strategy to hook on new customers with low prices and after the customer built their product the prices will increase etc. but I can not be sure since that deal bounced on the ever increasing costs and difficult communication (one meeting it was yes, the bext it was no, then yes etc.)
 

Offline HoracioDos

  • Frequent Contributor
  • **
  • Posts: 344
  • Country: ar
  • Just an IT monkey with a DSO
Re: Raspberry Pi 4
« Reply #21 on: June 25, 2019, 04:06:45 pm »
I wonder if they will finally transition the supported/approved OS to 64-bit?
Still 32 bits. Raspbian Buster needs to be backwards compatible with older pi versions. Raspbian (32 bits) can address up to 4Gb. There is no need to change to 64 bits
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 9018
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: Raspberry Pi 4
« Reply #22 on: June 25, 2019, 04:40:44 pm »
Still 32 bits. Raspbian Buster needs to be backwards compatible with older pi versions. Raspbian (32 bits) can address up to 4Gb. There is no need to change to 64 bits
Could they do autoselection of 32 or 64 bit kernel while the userspace remains 32 bit? By understanding of that arrangement is that individual processes are still limited to 4GB, but combined they can use as much as the system has. (And tmpfs, being kernel level, would also not be subject to that limit.)
https://hackaday.com/2019/06/25/is-4gb-the-limit-for-the-raspberry-pi-4/
I wonder if there were plans to release an 8GB version, but delayed due to software not being ready to use it.

The Pi Zero and Zero W are really what's holding back a full 64 bit distribution. Perhaps it's time to make a branch for Pi 3 and newer and a legacy branch for everything else?
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #23 on: June 25, 2019, 10:12:39 pm »
I've seen some Chinese news sites speculate that as Huawei bought a perpetual ARM Architecture License they may be able to continue developing their own ARMv8.2 (if not later) chips as long as they want, even with bans.

Yes and no. They have perpetual ARMv8 architecture license, but if the ban continues, they won't be able to get new official cores like A77. So A76 would be their last core licensed directly from ARM reference design.

I know that. I was talking about developing their own cores, as they have done with Kunpeng.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #24 on: June 25, 2019, 10:15:14 pm »
I've seen some Chinese news sites speculate that as Huawei bought a perpetual ARM Architecture License they may be able to continue developing their own ARMv8.2 (if not later) chips as long as they want, even with bans.

What bans? You mean USA bans? I think a rich japanese dude bought ARM, no? In any case, ARM was british, wasn't it? Sorry if I'm being dense :-)

Whatever you may personally think about ARM's ownership and who has influence over them, they seem to think they need to stop dealing with Huawai:

https://www.anandtech.com/show/14373/report-arm-suspends-business-with-huawei
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #25 on: June 25, 2019, 10:19:26 pm »
I wonder if they will finally transition the supported/approved OS to 64-bit?
Still 32 bits. Raspbian Buster needs to be backwards compatible with older pi versions. Raspbian (32 bits) can address up to 4Gb. There is no need to change to 64 bits

One of my own benchmarks I use to compare CPUs shows the same C code running on an Odroid C2 being 22.7% faster when compiled for 64 bit than when compiled for 32 bit. This is a benchmark btw that uses less than 16 KB RAM, so 4 GB is not the issue.

19.500 sec Odroid C2 A53 @ 1536 MHz A64  276 bytes
23.940 sec Odroid C2 A53 @ 1536 MHz T32  204 bytes
 

Offline HoracioDos

  • Frequent Contributor
  • **
  • Posts: 344
  • Country: ar
  • Just an IT monkey with a DSO
Re: Raspberry Pi 4
« Reply #26 on: June 25, 2019, 11:30:32 pm »
I wonder if they will finally transition the supported/approved OS to 64-bit?
Still 32 bits. Raspbian Buster needs to be backwards compatible with older pi versions. Raspbian (32 bits) can address up to 4Gb. There is no need to change to 64 bits

One of my own benchmarks I use to compare CPUs shows the same C code running on an Odroid C2 being 22.7% faster when compiled for 64 bit than when compiled for 32 bit. This is a benchmark btw that uses less than 16 KB RAM, so 4 GB is not the issue.

19.500 sec Odroid C2 A53 @ 1536 MHz A64  276 bytes
23.940 sec Odroid C2 A53 @ 1536 MHz T32  204 bytes

Simon Long from Raspberrypi blog
We cannot avoid focussing on backwards compatibility; it may not matter to you, but it is massively important to us. There are 27 million Pis in the wild; I don’t have exact numbers to hand for how many of those are Pis 1, 2 and Zero, but it’s well over 10 million of them. As soon as we move to a 64-bit OS, those devices are orphaned, because we do not have the resource to maintain two separate forks of Raspbian. (Not to mention to handle the support requests we will get from the thousands of users who download the wrong version and find it doesn’t boot.)

No-one has yet managed to provide us with a convincing use-case for where a 64-bit OS actually provides a real, quantifiable benefit to end-users. 32-bit accesses the entire RAM of the 4GB Pi 4. 64-bit code is invariably larger than 32-bit code – compare the sizes of the 32-bit and 64-bit versions of Windows 7; the 64-bit version is 30-40% larger. That’s a lot of extra download bandwidth for us, and for our users. A lot of 64-bit code actually runs slower than the 32-bit equivalent – because it’s larger, it takes longer to pull in from backing store. There are numerous costs attached to 64-bit – and we have yet to find a proven use-case where it actually offers any benefit whatsoever to the vast majority of our user base.

So no, this is not the ideal platform for the transition, or the time to make it. When (and if) we have a board that has more than 4GB of RAM – and that is likely to be a good few years off from today’s launch – we will look at 64-bit. But until then, the advantages of 32-bit – backwards compatibility, size, speed; of which backwards compatibility is easily the largest – vastly outweigh the putative advantages of a move to 64-bit at this point in time.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #27 on: June 26, 2019, 03:14:01 am »
64-bit code is invariably larger than 32-bit code – compare the sizes of the 32-bit and 64-bit versions of Windows 7; the 64-bit version is 30-40% larger. That’s a lot of extra download bandwidth for us, and for our users. A lot of 64-bit code actually runs slower than the 32-bit equivalent – because it’s larger, it takes longer to pull in from backing store.

This is undeniably true for ARM 64 bit code.

There are a lot of ways that Aarch64 is well designed, but it's been my opinion ever since it was announced that ARM has been foolish to forget that it was the 16/32 bit opcode Thumb2 that really gave them dominance. ARM was doing well in mobile in 2003 with the ARM7TDMI but it was Thumb2 that launched them into hyperspace.

With 64 bit they seem to have had an attitude of "Well, if we're close to x86_64 and beating MIPS and PowerPC on code size then we'll be fine".

This is going to be, personal opinion, the one purely technical thing that will help drive RISC-V through the ARM world in the coming years. RISC-V has Thumb2-like instruction encoding and code size for 64 bit code (and 128 bit) as well as for 32 bit code. RV64 programs are essentially the same size as RV32 programs.
 

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 9018
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: Raspberry Pi 4
« Reply #28 on: June 26, 2019, 05:05:33 am »
So the infamous "3.5GB" issue does not apply to ARM? And that some 64 bit advantages on x86 (e.g. more registers, guaranteed SSE2 compatibility) do not apply to ARM?
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #29 on: June 26, 2019, 05:24:07 am »
So the infamous "3.5GB" issue does not apply to ARM? And that some 64 bit advantages on x86 (e.g. more registers, guaranteed SSE2 compatibility) do not apply to ARM?

I think 3.5 GB was always a PC architecture memory map thing, nothing to do with x86 as such.

ARM 64 bit certainly does have some advantages similar to those you give. You get 32 registers instead of 16 (whereas 64 bit x86 is 16 registers instead of 8), a better function call ABI, and I believe with the -A profile you are also guaranteed to have both FPU and NEON, so there is no need to check for them.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4955
  • Country: si
Re: Raspberry Pi 4
« Reply #30 on: June 26, 2019, 05:56:48 am »
It is possible to have more than 4GB of RAM on a 32bit x86 CPU, but windows generally doesn't implement it (Some server version do have it tho).

Its called PAE (Physical Address Extension) and works much like memory paging that was used on oldschool 8bit CPUs. And ARM does have its own implementation of it on the bigger chips, so they are indeed able to address more than 4GB of memory. As for the ~3.5GB instead of 4GB limit, that's mostly because the ~0.5GB is used as hardware registers for peripherals, video memory and other specialized tasks.

That being said just because the CPU can address more than 4GB doesn't mean that the applications running on it can easily make use of it all since the 32bit instruction set is not made for addressing so much memory. But the MMU can be used to map user application in different parts of the >4GB memory, so while a single one can't use it, you can still run multiple apps, each using its own chunk of memory to fill out all of the RAM.

Since Linux is capable of using PAE id expect there to be no big problems making a 8GB raspberry pi in the future on a 32bit OS. Just don't expect to be able to successfully malloc a 5GB chunk of memory in your program.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #31 on: June 26, 2019, 08:33:40 am »
For example my Mac has 16GB and it's running OSX 10.6 in 32 bits mode and everything is fine and dandy.
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline GromBeestje

  • Frequent Contributor
  • **
  • Posts: 280
  • Country: nl
Re: Raspberry Pi 4
« Reply #32 on: June 26, 2019, 01:06:29 pm »
It is possible to have more than 4GB of RAM on a 32bit x86 CPU, but windows generally doesn't implement it (Some server version do have it tho).

All Windows kernels implement it, it's just not normally enabled on non-server SKUs. By cracking Windows kernel (magic byte replacement using a hex editor), PAE can be enabled.

http://wj32.org/wp/2013/10/25/pae-patch-updated-for-windows-8-1/

This is enforced since, if memory serves correct, Windows XP Service Pack 2. Windows XP gold and Windows 2000 give you all the memory. Back in the days this was a reason not to install the Service Pack.
 

Offline eugenenine

  • Frequent Contributor
  • **
  • Posts: 865
  • Country: us
Re: Raspberry Pi 4
« Reply #33 on: June 27, 2019, 12:48:32 am »
PAE was just a windows issue, mostly for backwards compatibility.

I've always ran Slackware ARM so I have been running 64bit on my 3's already.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #34 on: June 27, 2019, 03:26:33 am »
PAE was just a windows issue, mostly for backwards compatibility.

The PAE *name* is an x86-specific thing (not Windows), but the concept is common across I'd think all modern CPUs.

For example in RISC-V land we have the following virtual memory modes:

sv32: supports a 4 GB virtual address space in a 16 GB (34 bit) physical address space, using a two-level page table. This is similar to PAE, except PAE used a 3-level page table to support a 36 bit (64 GB) physical address space.

sv39: supports a 39 bit (512 GB) virtual address space in a 56 bit physical address space, using a three-level page table.

sv48: supports a 48 bit (256 TB) virtual address space in a 56 bit physical address space, using a four-level page table.

As a comparison, early x86_64 supported 48 bit virtual and 40 bit physical address space, with the current page table format supporting a future limit of 64 bit virtual addresses in a 52 bit physical address space.

ARM also has a similar Large Physical Address Extension (LPAE) supporting up to a 44 bit physical address space from a 48 bit virtual address space (on ARMv8 obviously).
 
The following users thanked this post: newbrain

Offline HoracioDos

  • Frequent Contributor
  • **
  • Posts: 344
  • Country: ar
  • Just an IT monkey with a DSO
Re: Raspberry Pi 4
« Reply #35 on: July 10, 2019, 12:22:38 pm »
 
The following users thanked this post: edavid

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #36 on: July 10, 2019, 01:26:40 pm »
Pi4 not working with some chargers (or why you need two cc resistors)
https://www.scorpia.co.uk/2019/06/28/pi4-not-working-with-some-chargers-or-why-you-need-two-cc-resistors/

Ouch. That's a nice design fuck-up. ;D
 

Online Monkeh

  • Super Contributor
  • ***
  • Posts: 7992
  • Country: gb
Re: Raspberry Pi 4
« Reply #37 on: July 10, 2019, 01:27:43 pm »
Pi4 not working with some chargers (or why you need two cc resistors)
https://www.scorpia.co.uk/2019/06/28/pi4-not-working-with-some-chargers-or-why-you-need-two-cc-resistors/

Ouch. That's a nice design fuck-up. ;D

Ah, but their testing was very extensive so it's quite surprising. Just like the PoE fail. :-DD
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #38 on: July 10, 2019, 02:57:49 pm »
Probably extensive with just the power adapter they are suggesting (which is likely just a dumb adapter routing the +5V to the VBUS pins)...

Now USB-C is a bit of a bitch, so if you're going to use it, at least use it right! Otherwise they should just have added a basic power connector, much simpler and less expensive...
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6460
  • Country: nl
Re: Raspberry Pi 4
« Reply #39 on: July 10, 2019, 03:04:05 pm »
I always hated the USB connector for power.
Why not a simple small Phoenix Contact connector with screws, at least you can connect a decent power supply instead of a ccc.
 

Offline CJay

  • Super Contributor
  • ***
  • Posts: 4136
  • Country: gb
Re: Raspberry Pi 4
« Reply #40 on: July 10, 2019, 03:29:12 pm »
I always hated the USB connector for power.
Why not a simple small Phoenix Contact connector with screws, at least you can connect a decent power supply instead of a ccc.

Because "everyone" has a microUSB PSU (I know, not everyone has one suitable now the current draw is higher but I ran my Pi 1 from a HTC phone charger for years), they all have the same polarity and are all 5V, fit a connector that gives the user scope to reverse polarity it or apply 240VAC and users will do that.

 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: Raspberry Pi 4
« Reply #41 on: July 10, 2019, 03:51:52 pm »
I've also always hated the USB connector for power. The vast majority of my micro USB power supplies are inadequate and result in unreliable performance. Most of the cables are also insufficient and have too much resistance. I've resorted to soldering on a barrel jack on a pigtail and that has always been rock solid. I have loads of 5V wall warts with standard 2.1mm barrel jacks and I've never had one not work. Wish they'd at least provide pads for something more sensible.
 

Offline eugenenine

  • Frequent Contributor
  • **
  • Posts: 865
  • Country: us
Re: Raspberry Pi 4
« Reply #42 on: July 10, 2019, 05:01:33 pm »
The option to power from pins on the GPIO has always been available.
 
The following users thanked this post: Siwastaja

Offline NiHaoMike

  • Super Contributor
  • ***
  • Posts: 9018
  • Country: us
  • "Don't turn it on - Take it apart!"
    • Facebook Page
Re: Raspberry Pi 4
« Reply #43 on: July 11, 2019, 02:01:14 am »
I've also always hated the USB connector for power. The vast majority of my micro USB power supplies are inadequate and result in unreliable performance. Most of the cables are also insufficient and have too much resistance. I've resorted to soldering on a barrel jack on a pigtail and that has always been rock solid. I have loads of 5V wall warts with standard 2.1mm barrel jacks and I've never had one not work. Wish they'd at least provide pads for something more sensible.
My experience is that any cable and adapter that works well for charging a modern smartphone will work well for powering a Pi.
Cryptocurrency has taught me to love math and at the same time be baffled by it.

Cryptocurrency lesson 0: Altcoins and Bitcoin are not the same thing.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Raspberry Pi 4
« Reply #44 on: July 11, 2019, 07:46:35 pm »
I've also always hated the USB connector for power. The vast majority of my micro USB power supplies are inadequate and result in unreliable performance. Most of the cables are also insufficient and have too much resistance. I've resorted to soldering on a barrel jack on a pigtail and that has always been rock solid. I have loads of 5V wall warts with standard 2.1mm barrel jacks and I've never had one not work. Wish they'd at least provide pads for something more sensible.
My experience is that any cable and adapter that works well for charging a modern smartphone will work well for powering a Pi.
I agree. I've had troubles with USB cables powering a RasPi, but those were cables that wouldn't charge a phone properly. I haven't had any problems when using a USB cable that worked well with a phone. However, if the OP has lots of 5V wall warts around, there no reason not to use them.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4955
  • Country: si
Re: Raspberry Pi 4
« Reply #45 on: July 11, 2019, 08:16:00 pm »
Yeah i have seen WAY too many shitty USB cables that barely have any copper inside of them.

In fact at my day job i had a software guy come over about the product i designed, saying that no matter what he does it just won't charge the battery or it charges it with next to no current. So i hook it up at desk and it charges just fine, so i go over to see his setup and find a suspiciously flexible USB cable, turns out that cable was dropping a whole volt across it as soon as you tried to draw any proper current, so the charging IC inside my device was backing off the current thinking its overloading the charger due to the sagging input voltage.

These are the sort of cables that don't work on a Raspberry Pi and should be burned with there remains burred deep under so that nobody else would have to suffer there shitty performance anymore.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Raspberry Pi 4
« Reply #46 on: July 11, 2019, 08:56:52 pm »
Yeah i have seen WAY too many shitty USB cables that barely have any copper inside of them.
A lot of good quality USB cables have serious problems supplying high current because of dirt build up on the contacts. A good spray with switch cleaner frequently improves things greatly.
 

Offline Kjelt

  • Super Contributor
  • ***
  • Posts: 6460
  • Country: nl
Re: Raspberry Pi 4
« Reply #47 on: July 11, 2019, 10:31:23 pm »
It is just NOT a connector I feel comfortable running 2Amps continuously through.
 

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: Raspberry Pi 4
« Reply #48 on: July 11, 2019, 10:36:24 pm »
I've never had a smartphone that charges with a regular USB cable and I keep my phones for ~5 years so I don't exactly have a pile of quality micro USB cables laying around. You never really know what you're getting when buying them either until you actually try them.

Whatever the case the USB cable has been my biggest source of aggravation by far with these things. I've always had to either specifically hunt down something reported to work well and buy it, or hack the thing and solder in a standard barrel jack which has worked perfectly every time. It should just have a buck regulator onboard for the 5V as well as 3.3V so it could use any standard 9-12V wall wart which are common as muck and generally quite robust with reasonably heavy cable.

Starting with 5V and expecting to have 5V at the other end of a thin cable with a teeny tiny connector is just asking for trouble.
 

Offline Mr. Scram

  • Super Contributor
  • ***
  • Posts: 9810
  • Country: 00
  • Display aficionado
Re: Raspberry Pi 4
« Reply #49 on: July 11, 2019, 10:40:28 pm »
A lot of chargers don't start with 5V though. They take the drop into account and actually start higher.
 

Online thm_w

  • Super Contributor
  • ***
  • Posts: 6380
  • Country: ca
  • Non-expert
Re: Raspberry Pi 4
« Reply #50 on: July 11, 2019, 10:46:01 pm »
The problem with that is someone plugs in a reverse polarized plug, or a 24V+ pack, and boom, warranty claim. It adds cost, complexity, etc. The USB plug was simple and you can't screw it up to the point of killing the Pi.

But agree it didn't work well for me either, I would always see the lightning undervoltage symbol during boot with every usb cable/adapter combo I tried (2.4A adapters, short cables, etc.), maybe thats normal?
Profile -> Modify profile -> Look and Layout ->  Don't show users' signatures
 

Offline chickenHeadKnob

  • Super Contributor
  • ***
  • Posts: 1055
  • Country: ca
Re: Raspberry Pi 4
« Reply #51 on: July 12, 2019, 02:42:19 am »
--snip---
  with reasonably heavy cable.

Starting with 5V and expecting to have 5V at the other end of a thin cable with a teeny tiny connector is just asking for trouble.

Adafruit sells these; https://www.adafruit.com/product/1995 a 5V 2.5amp rated wart.

They have thicker cables (claimed 20 awg) and boosted voltage.  I seem to remember it used to be 5.15V now I see they claim 5.25V. Anyway I have no trouble powering R Pi's with them  and I have been a happy camper. Who knows what lurks inside the plastic shell, probably the typical horror show.  :scared:

I have never tried with  an R pi hosting a heavy consuming USB device. I would imagine going thru micro usb to power an additional 500 mA usb slave on top of the Pi power consumption dodgy as hell.
 

Offline floobydust

  • Super Contributor
  • ***
  • Posts: 6995
  • Country: ca
Re: Raspberry Pi 4
« Reply #52 on: July 12, 2019, 08:25:39 pm »
RPi 4 "reduced" schematic they got rid of the input polyfuse which always tripped with heavy USB draw on the RPi 3 and earlier. It was annoying always getting <4.8V due to the polyfuse drop with just a mouse and keyboard, I just jumper them.
Now there is only some 1005 10R resistors as fuses for the Vcore LDO.

But no RPi 4 USB schematic given, so who knows if they have anything for the 4 ports or if the USB IC looks after it. I've never had a USB peripheral short-circuit.
 
The following users thanked this post: thm_w

Offline james_s

  • Super Contributor
  • ***
  • Posts: 21611
  • Country: us
Re: Raspberry Pi 4
« Reply #53 on: July 12, 2019, 11:00:02 pm »
The problem with that is someone plugs in a reverse polarized plug, or a 24V+ pack, and boom, warranty claim. It adds cost, complexity, etc. The USB plug was simple and you can't screw it up to the point of killing the Pi.

But agree it didn't work well for me either, I would always see the lightning undervoltage symbol during boot with every usb cable/adapter combo I tried (2.4A adapters, short cables, etc.), maybe thats normal?

The same thing happens if someone feeds reverse polarity to the GPIO header or wires the IO pins to 5V peripherals, or any number of other dumb things users are bound to do. You can't make it completely idiot proof, personally I'd rather have simple and reliable power for mine even if it meant a few more noobs blew theirs up, maybe they'd learn to be sensible about polarity and voltage rather than expecting someone else to prevent their mistakes.
 
The following users thanked this post: thm_w, Kjelt

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #54 on: July 13, 2019, 01:14:30 pm »
Keeping the USB connector (makes it easy for many) but adding a dedicated header or maybe even just pads for the power supply, just parallel to the USB VBUS, would have been a good idea. I find powering it through the GPIO headers clunky, error prone (if you put the +5V on a nearby pin, that's probably not going to be pretty) and as I recall, it bypasses any protection (though am I right in having understood that there is not much protection left on the RPi 4 anyway?) Adding a protection against reverse polarity would not have been a big deal either.

 

Offline CJay

  • Super Contributor
  • ***
  • Posts: 4136
  • Country: gb
Re: Raspberry Pi 4
« Reply #55 on: July 15, 2019, 05:59:02 am »
Keeping the USB connector (makes it easy for many) but adding a dedicated header or maybe even just pads for the power supply, just parallel to the USB VBUS, would have been a good idea. I find powering it through the GPIO headers clunky, error prone (if you put the +5V on a nearby pin, that's probably not going to be pretty) and as I recall, it bypasses any protection (though am I right in having understood that there is not much protection left on the RPi 4 anyway?) Adding a protection against reverse polarity would not have been a big deal either.

It's a known failure mode

https://hackaday.com/2019/06/12/shorting-pins-on-a-raspberry-pi-is-a-bad-idea-pmic-failures-under-investigation/
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4955
  • Country: si
Re: Raspberry Pi 4
« Reply #56 on: July 15, 2019, 08:01:03 am »
Keeping the USB connector (makes it easy for many) but adding a dedicated header or maybe even just pads for the power supply, just parallel to the USB VBUS, would have been a good idea. I find powering it through the GPIO headers clunky, error prone (if you put the +5V on a nearby pin, that's probably not going to be pretty) and as I recall, it bypasses any protection (though am I right in having understood that there is not much protection left on the RPi 4 anyway?) Adding a protection against reverse polarity would not have been a big deal either.

It's a known failure mode

https://hackaday.com/2019/06/12/shorting-pins-on-a-raspberry-pi-is-a-bad-idea-pmic-failures-under-investigation/

This is not only a RaspberryPi thing. A lot of products out there with a 5V and 3V3 rail inside of them can die if the two are shorted together.

The problem is that regulator ICs can only source current and not sink it. So in the case of shorting a 5V and 3V3 rail the 5V regulator just outputs however much current is needed to keep its output at 5V while the 3V3 regulator will start outputting zero current in hopes that that will make its rail fall back down to 3.3V. Because of this the 3V3 rail is dragged up to be 5V too. And as you might guess putting 5V into 3.3V chips is not a good idea.

Tho in practice a lot of 3V3 chips appear to be able to survive being powered with 5V for a short time, but i guess that Raspberry Pi PMIC is not one of them.

For this very reason the high end lab power supplies are made capable of sinking current. It prevents the same thing from happening on a lab PSU by allowing the supply to pull down on its output if it rises too high, this hopefully forces the higher voltage supply into current limiting mode and saves the DUT from being blown up.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #57 on: November 01, 2019, 10:42:04 am »
RPi is an interesting datapoint for the OSHW zealots.     There has never been full schematics and you can't get a CPU manual (easily).   It has always been the darling of the maker community.

Indeed.  But the sad truth is that there aren't the open alternatives there once were.  After TI got out of the high-end applications-processor market the choice has come down to nasty US semi mfgs (Broadcom, Nvidia, ...) and cheap Chinese stuff (Rockchip, Allwinner).  Oddly, I do get the sense sometimes that the Chinese chips are a little more open, but usable documentation is generally lacking.

Maybe RISC-V will save us :-\

Maybe, but any cheapish RISC-V board you see in the next 12-24 months will be around Pi 3+ or Odroid C2 A53 performance. No one has yet even announced anything at A72 levels.

Well, we've announced our A72-level U84 RISC-V core. Of course that means first customer chips are many months away.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #58 on: November 01, 2019, 10:52:36 am »
I wonder if they will finally transition the supported/approved OS to 64-bit?
Still 32 bits. Raspbian Buster needs to be backwards compatible with older pi versions. Raspbian (32 bits) can address up to 4Gb. There is no need to change to 64 bits

One of my own benchmarks I use to compare CPUs shows the same C code running on an Odroid C2 being 22.7% faster when compiled for 64 bit than when compiled for 32 bit. This is a benchmark btw that uses less than 16 KB RAM, so 4 GB is not the issue.

19.500 sec Odroid C2 A53 @ 1536 MHz A64  276 bytes
23.940 sec Odroid C2 A53 @ 1536 MHz T32  204 bytes

Well, I just got a 4 GB Pi4 and tried my benchmark on it:

11.190 sec Pi4 Cortex A72 @ 1.5 GHz T32 Raspbian
11.445 sec Odroid XU4 A15 @ 2 GHz T32
12.190 sec Pi4 Cortex A72 @ 1.5 GHz A64 Ubuntu 64 bit
12.605 sec Pi4 Cortex A72 @ 1.5 GHz A32 Raspbian
30.420 sec Pi3 Cortex A53 @ 1.2 GHz T32
47.910 sec Pi2 Cortex A7 @ 900 MHz T32

So it whomps the Odroid C2 and of course the Pi3 and Pi2, and is very comparable to the A15 running 33% higher clock rate in the Odroid XU4.
 
The following users thanked this post: Mr. Scram

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #59 on: November 01, 2019, 10:53:57 am »
I wonder if they will finally transition the supported/approved OS to 64-bit?
Still 32 bits. Raspbian Buster needs to be backwards compatible with older pi versions. Raspbian (32 bits) can address up to 4Gb. There is no need to change to 64 bits

I installed a 64 bit Ubuntu Server on my Pi4 from here:

https://jamesachambers.com/raspberry-pi-ubuntu-server-18-04-2-installation-guide/
 
The following users thanked this post: HoracioDos

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #60 on: November 01, 2019, 03:48:41 pm »
I don't know enough about this benchmark, but is it assembly-based or pure C? In the latter case, the end result could significantly depend on the C compiler options and versions?
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #61 on: November 01, 2019, 08:00:56 pm »
I don't know enough about this benchmark, but is it assembly-based or pure C? In the latter case, the end result could significantly depend on the C compiler options and versions?

It's a simple C benchmark with quite a lot of memory access (L1 cache on bigger systems) and branches that aren't trivially predictable, designed to take long enough to measure on a fast PC, but also fit into the available RAM on a relatively small microcontroller.

http://hoult.org/primes.txt

You're certainly correct that the result is dependent on the compiler quality. I've always used gcc with -O1 as the only option but as I've been using this benchmark for a few years now the generated code does vary a little on the same ISA at different times with different gcc versions.

If something looks 10% faster or slower than something else, it might not be in reality or on other benchmarks. If something loooks twice faster or slower than something else there is probably something to it.
 
The following users thanked this post: Mr. Scram

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: Raspberry Pi 4
« Reply #62 on: November 02, 2019, 11:03:18 am »
FYI - BluePill @72MHz, -O1, gcc, found 3713160 primes

// 927.547 sec BluePill Cortex M3 @ 72 MHz

 :)
 
The following users thanked this post: GeorgeOfTheJungle, Mr. Scram

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #63 on: November 02, 2019, 12:45:05 pm »
FYI - esp32 @ 240 MHz

3713160 primes found in 261068 ms (with -O1)
« Last Edit: November 14, 2019, 09:34:09 am by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 
The following users thanked this post: Mr. Scram

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: Raspberry Pi 4
« Reply #64 on: November 02, 2019, 04:35:47 pm »
FYI - BlackPill F407 @168MHz, -O1, gcc, found 3713160 primes

// 309.251 sec BlackPill Cortex M4F @ 168 MHz

:)
 
The following users thanked this post: GeorgeOfTheJungle, Mr. Scram

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #65 on: November 02, 2019, 06:31:21 pm »
FYI - esp8266 @ 160 MHz

3713160 primes found in 306988 ms

:)
« Last Edit: November 14, 2019, 09:34:28 am by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 
The following users thanked this post: Mr. Scram

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: Raspberry Pi 4
« Reply #66 on: November 02, 2019, 07:04:44 pm »
When comparing flash-less esp8266 @ 160MHz and the stm32F407 at the 168MHz - it looks like the ART accelerator in the stm32f407 works fine :)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #67 on: November 02, 2019, 10:05:37 pm »
Hey thanks so much for the results guys! I've added them to my file. It's great to get results for some slower embedded machines as well as for desktop etc.

I can guess that the code size for the Cortex M3 and M4F should be the same as any other Thumb2 (T32) machine, give or take different compiler versions, but I have no idea about the ESP ones. That's Xtensa, right?
 

Online Monkeh

  • Super Contributor
  • ***
  • Posts: 7992
  • Country: gb
Re: Raspberry Pi 4
« Reply #68 on: November 02, 2019, 10:08:28 pm »
Hey thanks so much for the results guys! I've added them to my file. It's great to get results for some slower embedded machines as well as for desktop etc.

I can guess that the code size for the Cortex M3 and M4F should be the same as any other Thumb2 (T32) machine, give or take different compiler versions, but I have no idea about the ESP ones. That's Xtensa, right?

Yep. Apparently the Diamond Standard 106Micro for the 8266 and the LX6 for the 32. And that's about as much as I know about Xtensa. :)
 

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: Raspberry Pi 4
« Reply #69 on: November 03, 2019, 11:29:35 am »
FYI - "ChipKIT Pro MZ" board @200MHz (Processor: pic32MZ2048EFG064), pic32 compiler, MIPS32, fast, found 3713160 primes

// 294.749 sec chipKIT Pro MZ pic32MZ @ 200 MHz

 ;)

// 261.068 sec esp32/Arduino @ 240 MHz
// 294.749 sec chipKIT Pro MZ pic32MZ @ 200 MHz
// 306.994 sec esp8266/Arduino @ 160 MHz
// 309.251 sec BlackPill Cortex M4F @ 168 MHz
// 927.547 sec BluePill Cortex M3 @ 72 MHz

« Last Edit: November 03, 2019, 11:55:54 am by imo »
 
The following users thanked this post: GeorgeOfTheJungle

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #70 on: November 03, 2019, 06:51:21 pm »
FYI - "ChipKIT Pro MZ" board @200MHz (Processor: pic32MZ2048EFG064), pic32 compiler, MIPS32, fast, found 3713160 primes

// 294.749 sec chipKIT Pro MZ pic32MZ @ 200 MHz

Added, thanks! My first MIPS ISA result.

Curious that I have only one data point between 48 seconds and 260 seconds, which is a pretty large speed range. I guess it's fast for embedded, but too slow for desktop use. The original Raspberry  Pi (or the Zero) should be in there somewhere but I don't have one.
 

Offline HoracioDos

  • Frequent Contributor
  • **
  • Posts: 344
  • Country: ar
  • Just an IT monkey with a DSO
Re: Raspberry Pi 4
« Reply #71 on: November 04, 2019, 07:56:06 pm »
I wonder if they will finally transition the supported/approved OS to 64-bit?
Still 32 bits. Raspbian Buster needs to be backwards compatible with older pi versions. Raspbian (32 bits) can address up to 4Gb. There is no need to change to 64 bits

I installed a 64 bit Ubuntu Server on my Pi4 from here:

https://jamesachambers.com/raspberry-pi-ubuntu-server-18-04-2-installation-guide/
Good to know!
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #72 on: November 05, 2019, 10:14:25 am »
I've added the following:

//   3.107 sec Threadripper 2990WX @ 4.2 GHz  242 bytes  13.0 billion clocks
//  26.550 sec HiFive Unl RISCV U54 @ 1.5 GHz 208 bytes  39.8 billion clocks
//  39.840 sec HiFive Unl RISCV U54 @ 1.0 GHz 208 bytes  39.8 billion clocks

The RISC-V U54 uses about 9% more clock cycles than the Cortex A53 in a Raspberry Pi3. At 1.3 GHz (not shown, but I tried it) it is almost exactly the same speed as the Pi3 at 1.2 GHz. Note that the U54 is a single-issue CPU while the A53 is dual-issue superscalar (as is the U74).

I'll see if I can run it on a U84 sometime soon (4-decode, 3-issue Out-of-Order, similar to A72).
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #73 on: November 11, 2019, 03:01:51 pm »
The teensy 4.0:

@912MHz: 3713160 primes found in 24592 ms
@816MHz: 3713160 primes found in 27485 ms
@720MHz: 3713160 primes found in 31150 ms
@600MHz: 3713160 primes found in 37381 ms
« Last Edit: November 14, 2019, 09:35:01 am by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 
The following users thanked this post: iMo

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #74 on: November 11, 2019, 07:17:22 pm »
Dual issue in-order M7 runs the program in the same number of clock cycles as the Out-of-Order A15, and trounces dual issue A53?

Shirley shumthing is wrong?

Is that using -O1?
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #75 on: November 12, 2019, 12:18:24 am »
Thanks. That seems more like it. That puts it at exactly the same efficiency per MHz as the dual-issue U54 RISC-V. But it's still beating the A53 in the Pi3 and Odroid C2 by 40% and the Cortex A7 (surely similar to M7?) in the Raspberry Pi 2 by 61%.

I don't doubt that Paul does better work than the Raspberry Pi foundation. And maybe the DTCM in the M7 is just *that* much better than the L1 cache in the A7 or A53. Wow.

I want to see this for myself. I've just ordered a Teensy 4 on Amazon (Paul's store is offline) and will have it tomorrow. (I already have some older Teensy's)
« Last Edit: November 12, 2019, 03:28:32 am by brucehoult »
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #76 on: November 12, 2019, 03:21:43 am »
Can the MCU on the Teensy 4.0 board really run reliably @1GHz? That's pretty impressive. Is it not getting too hot?

 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: Raspberry Pi 4
« Reply #77 on: November 12, 2019, 11:02:40 am »
On an XMOS XS1 8 core unit(500MHz total clock, 125MHz max per "thread") I got...

333333.148 milliseconds all running one thread and...
46682.174 111828.668 milliseconds running 4 threads. Nothing tricky I just literally split the load equally across 4 cores where each did one chunk of 928290.

EDIT: Fixed time, small issue with arrays caused large time discrepancy.
« Last Edit: November 12, 2019, 05:19:42 pm by maginnovision »
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: Raspberry Pi 4
« Reply #78 on: November 12, 2019, 05:19:06 pm »
On an XMOS XS1 8 core unit(500MHz total clock, 125MHz max per "thread") I got...

333333.148 milliseconds all running one thread and...
46682.174 milliseconds running 4 threads. Nothing tricky I just literally split the load equally across 4 cores where each did one chunk of 928290.

Could you do a gist on github of that? I'd like to run it on the esp32 with two threads/cores.

Yea, I'll update this post later with the link.

https://gist.github.com/Maginnovision/2f7bd99afeeed351d421573950fbfdee

Here is the gist. For 2 threads you can use just countPrime0() and countPrime2() and change REQ_PRIMES to 1856580. Run each on its own thread. You can use the counter pointer or not(do it local), I was using it to verify they were all done.

I use this small sketch for a teensy 3.6 to time it:

Code: [Select]
int led = 13;
void setup() {               
  // initialize the digital pin as an output.
  Serial.begin(9600);
  pinMode(led, INPUT_PULLDOWN);
  delay(2000);     
}
unsigned long s_time = 0;
unsigned long e_time = 0;

// the loop routine runs over and over again forever:
void loop() {
  Serial.printf("\n\nWaiting for signal on LED pin...\n");
  while(digitalReadFast(led) == 0) {};
  s_time = micros();
  Serial.printf("Starting timer...\n");
 
  while(digitalReadFast(led) != 0) {};
  e_time = micros();
 
  double micro_seconds = e_time-s_time;
  double milli_seconds = micro_seconds / 1000.0;
  double seconds = milli_seconds / 1000.0;
  double minutes = seconds / 60.0;
  Serial.printf("Benchmark took: %u microseconds\n\t\t%.3f milliseconds\n\t\t%.3f seconds\n\t\t%.3f minutes\n", e_time-s_time, milli_seconds, seconds, minutes);
}
« Last Edit: November 13, 2019, 01:37:22 am by maginnovision »
 
The following users thanked this post: GeorgeOfTheJungle

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: Raspberry Pi 4
« Reply #79 on: November 12, 2019, 05:57:23 pm »
Can the MCU on the Teensy 4.0 board really run reliably @1GHz? That's pretty impressive. Is it not getting too hot?
I don't think it can run for long, gets quite toasty at 1GHz.

Code: [Select]
@1008MHz: 250mA@5V => 1.25W
@960MHz:  230mA@5V => 1.15W
@912MHz:  210mA@5V => 1.05W
@816MHz:  165mA@5V => 0.825W
@720MHz:  140mA@5V => 0.7W
@600MHz:  100mA@5V => 0.5W
Isn't the mcu 3.3V??
 

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: Raspberry Pi 4
« Reply #80 on: November 12, 2019, 06:32:58 pm »
In case there is a switcher 5V/3.3V on the Teensy board you will get pretty different results for the MCU's power dissipation then..
PS: there is none switcher, so do subtract say 20mA (the other circuitry there on the pcb) from that current and multiply by 3.3V instead of 5V..
https://www.pjrc.com/teensy/schematic.html
« Last Edit: November 12, 2019, 06:40:53 pm by imo »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #81 on: November 12, 2019, 08:24:46 pm »
Isn't the mcu 3.3V??

Yes, but I used one of these:



Oh! I need one of those. I have a mains voltage "Kill A Watt" which ,at the moment, is showing my HiFive Unleashed drawing 6.15 W at idle at 1.45 GHz (and near 8 when fully loaded).

[The FU-540 in the HiFive Unleashed is a test chip that taped out 2 years ago. Unlike our current cores, it has no clock gating or automatic frequency adjustment etc, and was fabricated in a high-leakage corner of the 28nm process. I just knocked it back to 10 MHz and it's still using 5.35 W at idle, 5.40 fully loaded. And takes 24 seconds to start emacs and open a small text file, instead of 8.6 at 100 MHz or 1.0 at 1.5 GHz.]

Anyway .. USB testers / meters ... there are a ton on Amazon. That one is $9.99. Some are a little less, some a little more .. not enough to matter. So is that  good one? https://www.amazon.com/Soondar-Charging-Concurrent-Real-time-Smartphone/dp/B00ORNOWZK/
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #82 on: November 12, 2019, 08:31:16 pm »
It looks like this one, for $1 more, might support higher current and also total Joules or mAh or something. https://www.amazon.com/X-DRAGON-Multimeter-Chargers-Capacity-Accuracy/dp/B019RHJRM8
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #83 on: November 12, 2019, 08:53:05 pm »
PS: there is none switcher, so do subtract say 20mA (the other circuitry there on the pcb) from that current and multiply by 3.3V instead of 5V..
https://www.pjrc.com/teensy/schematic.html

Well, it uses a TLV75733 LDO for the 3.3V, which has a typical quiescent current of 25µA, so the current drawn at the input would be basically the current the MCU draws + 25µA (+ what the other chips on the 3.3V draw: the W25Q16 draws less than 1µA in power-down mode, which I assume it is in once the MCU has booted?, and the MKL02Z32, which is a small MCU that I also assume would be in low-power mode most of the time). So the excess current would be much less than 20mA. I think you can basically neglect it compared to the current the main MCU draws.

As to the power figure GeorgeOfTheJungle gave, you're right, it's wrong. It's the total power drawn from USB, not the power dissipated by the MCU!
 

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: Raspberry Pi 4
« Reply #84 on: November 12, 2019, 08:58:50 pm »
With USB meter showing 5V and a current of 100mA through the board, the power dissipation of the MCU will be 3.3V*(100mA-20mA)=0.264W. The 20mA is my estimation of the other components current on the Teensy 4.0 board with linear regulator. The power loss at the 3.3V regulator will be 100mA*1.7V = 0.17W.
In case you have a board with a switcher powering the MCU, the 100mA on the USB meter display will be something like 136mA through the board.. Thus the MCU power dissipation will be 3.3V*(136mA-20mA)=0.38W.
Example only.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #85 on: November 12, 2019, 09:31:27 pm »
Hey guys, can't you multiply current by 3.3? LOL. The current is the same with an LDO!!!

I personally mainly "rectified" imo's estimation (which I think is largely overestimated) of 20mA of current draw on the 3.3V rail outside of the MCU, so I just said that it could probably be neglected here. Though if in fact the Flash chip is not put in power-down mode, and the small MCU is fully active, he may be close to the real figure, so, that would have to be checked.

So yes, its basically just a matter of multiplying your current figure by 3.3V. The figures you gave are not wrong per se (as you didn't claim they were the MCU power dissipation), but I think, if you're gonna give power figures, you could as well give the MCU power dissipation directly. Would be more evocative. Just a detail. I think imo assumed (and in turn made me first assume) the power figures were for the MCU itself, as it's what is interesting here.

 
The following users thanked this post: iMo

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #86 on: November 12, 2019, 09:40:43 pm »
Have you got the teensy 4 already?

"Out for delivery"
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #87 on: November 13, 2019, 12:27:49 am »
Got my Teensy 4.0 board and set it up.

I get the same results as GeorgeOfTheJungle, at 600 MHz, to the ms, adding my code to an Arduino sketch and adapting main() slightly and calling it from setup():
Code: [Select]
int main(){
  long beg = millis();
  int res = countPrimes();
  long m = millis() - beg;
  Serial.print(res);
  Serial.print(" primes found in ");
  Serial.print(m);
  Serial.println(" ms");
  return 0;
}

3713160 primes found in 37381 ms "faster" (the default)
3713160 primes found in 43516 ms "fast"

Verified that "fast" is -O1 and "faster" is -O2.  Compile line for "fast":

/home/bruce/software/arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-gcc -O1 -Wl,--gc-sections,--relax -T/home/bruce/software/arduino-1.8.10/hardware/teensy/avr/cores/teensy4/imxrt1062.ld -mthumb -mcpu=cortex-m7 -mfloat-abi=hard -mfpu=fpv5-d16 -o /tmp/arduino_build_829669/Blink.ino.elf /tmp/arduino_build_829669/sketch/Blink.ino.cpp.o /tmp/arduino_build_829669/core/core.a -L/tmp/arduino_build_829669 -larm_cortexM7lfsp_math -lm -lstdc++

The code is 228 bytes long which is in line with some other Thumb2 results I've had but bigger than some. The gcc is prettty old .. 5.4.1 20160919.

Something that confuses me is that the objdump supplied in arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-objdump doesn't understand some of the instructions in the elf file!
« Last Edit: November 13, 2019, 03:20:21 am by brucehoult »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #88 on: November 13, 2019, 12:47:17 am »
I have to say the OOBE with Teensy 4.0 is pretty great. I remember having to work much harder to make Teensy 2.0 work back in 2009 or whenever.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #89 on: November 13, 2019, 01:12:31 am »
Out of curiosity, could you try with -O3? (As it activates pretty aggressive optimizations that can make a difference.)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #90 on: November 13, 2019, 01:17:35 am »
Out of curiosity, could you try with -O3? (As it activates pretty aggressive optimizations that can make a difference.)

Sure.

3713160 primes found in 37381 ms -O3
3713160 primes found in 39171 ms -Os

So in this case -O2 and -O3 are exactly the same
 
The following users thanked this post: SiliconWizard

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: Raspberry Pi 4
« Reply #91 on: November 13, 2019, 01:24:30 am »
I have to say the OOBE with Teensy 4.0 is pretty great. I remember having to work much harder to make Teensy 2.0 work back in 2009 or whenever.

The teensy boards have been pretty much ready to go since T3.1 I think. Once he really had his own designs rather than shrinking others.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #92 on: November 13, 2019, 02:49:35 am »
So in this case -O2 and -O3 are exactly the same

By the way, I consider this a good thing. When I wrote this code (back when I had only x86 and Raspberry Pi 2/3 to test it on) I especially wanted it to be a test of the CPU, not the compiler.
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4955
  • Country: si
Re: Raspberry Pi 4
« Reply #93 on: November 13, 2019, 12:04:06 pm »
Out of curiosity, could you try with -O3? (As it activates pretty aggressive optimizations that can make a difference.)

-O3 can sometimes slow things down. Some of the optimizations deal with code size rather than speed. So usually a good bet is the -O2 since its pretty much always faster than any optimization levels below it.

-Os in gcc means optimize for size. It does tend to make code a good deal faster than -O0 but its main goal is size, not speed

-Ofast in gcc is what you want for speed. This turns on only optimizations that help with speed and ignores any code size optimization. This should be the fastest as long as the larger code size doesn't cause enough memory traffic to bottle neck the CPU from getting its data from RAM fast enough.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #94 on: November 13, 2019, 06:28:17 pm »
It's of course extremely dependent both on the target in question and on the code itself.

In my experience, -O3 yields faster execution than -O2 in many cases, and otherwise is usually at least as fast, and I've never personally run into a case where it was slower, so this is my usual default (unless I work on very small targets, for which optimizing for size would be critical. -O3 does aggressive code inlining in many cases so the code size can inflate significantly. Depends of course on your code structure.) I benchmarked sorting algorithms lately and got consistently +10% faster execution with -O3 than -O2 on PC targets. With some computing intensive stuff (especially with floating point), it can be as high as +30 to +50% faster...

I did some tests with Bruce's code, and I confirm that with his code I also get the same execution time with -O2 and -O3 on my Core i7: 2490 ms. Now I tried with -Ofast, and it's actually slightly slower (which is consistent with my previous benchmarks with -Ofast which I've found often slower than -O3 actually), with 2550 ms. This is not that surprising, as execution time depends on many factors including how code and data are cached.

Of course benchmarking across different targets is a tricky business and the way you do it all depends on your goals. If you want to know the fastest a given algorithm can execute on a given CPU, you could write it directly in optimized assembly, using any specific instruction you can to speed things up. That would be really taking advantage of said CPU. Now if you want more of a typical "feel" of what you'd get with real-life code in a high-level language, using generic C and moderate optimization levels makes sense.

 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #95 on: November 13, 2019, 11:19:22 pm »
I did some tests with Bruce's code, and I confirm that with his code I also get the same execution time with -O2 and -O3 on my Core i7: 2490 ms. Now I tried with -Ofast, and it's actually slightly slower (which is consistent with my previous benchmarks with -Ofast which I've found often slower than -O3 actually), with 2550 ms. This is not that surprising, as execution time depends on many factors including how code and data are cached.

On an i7-8650U (which I have two of .. a NUC and a ThinkPad X1 Carbon) it's actually faster with -O1 (2735ms) than with -O2 or -O3 (3428ms) !!
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: Raspberry Pi 4
« Reply #96 on: November 14, 2019, 03:55:16 am »
With XMOS parts -Os is the recommended O level for anything other than debugging. Speed or code size.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #97 on: November 14, 2019, 04:00:57 am »
I did some tests with Bruce's code, and I confirm that with his code I also get the same execution time with -O2 and -O3 on my Core i7: 2490 ms. Now I tried with -Ofast, and it's actually slightly slower (which is consistent with my previous benchmarks with -Ofast which I've found often slower than -O3 actually), with 2550 ms. This is not that surprising, as execution time depends on many factors including how code and data are cached.

On an i7-8650U (which I have two of .. a NUC and a ThinkPad X1 Carbon) it's actually faster with -O1 (2735ms) than with -O2 or -O3 (3428ms) !!

On my i7-5930K, I get: 2490 ms for -O2 and -O3, but 2605 ms for -O1... (GCC 9.2.0 here if that matters.)
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4955
  • Country: si
Re: Raspberry Pi 4
« Reply #98 on: November 14, 2019, 06:28:46 am »
It's of course extremely dependent both on the target in question and on the code itself.

In my experience, -O3 yields faster execution than -O2 in many cases, and otherwise is usually at least as fast, and I've never personally run into a case where it was slower, so this is my usual default (unless I work on very small targets, for which optimizing for size would be critical. -O3 does aggressive code inlining in many cases so the code size can inflate significantly. Depends of course on your code structure.) I benchmarked sorting algorithms lately and got consistently +10% faster execution with -O3 than -O2 on PC targets. With some computing intensive stuff (especially with floating point), it can be as high as +30 to +50% faster...

I did some tests with Bruce's code, and I confirm that with his code I also get the same execution time with -O2 and -O3 on my Core i7: 2490 ms. Now I tried with -Ofast, and it's actually slightly slower (which is consistent with my previous benchmarks with -Ofast which I've found often slower than -O3 actually), with 2550 ms. This is not that surprising, as execution time depends on many factors including how code and data are cached.

Of course benchmarking across different targets is a tricky business and the way you do it all depends on your goals. If you want to know the fastest a given algorithm can execute on a given CPU, you could write it directly in optimized assembly, using any specific instruction you can to speed things up. That would be really taking advantage of said CPU. Now if you want more of a typical "feel" of what you'd get with real-life code in a high-level language, using generic C and moderate optimization levels makes sense.

Yeah all of this is indeed heavily platform dependent. That's why i used the word "sometimes". There is no solid rule for what optimization is best for all scenarios.

On modern PCs getting things fast is mostly about making sure you can work in the cache as much as possible(and less code size helps with this too). The CPU can do quite a lot of optimizations on the fly such as out of order execution, branch prediction, hyperthreading all making sure the execution pipeline is as full as possible. Fast math is also about memory arrangement to make it fit into SIMD instructions. On the other hand a 8bit MCU is pretty stupid and requires the compiler to do more work optimizing things, especially when it has very few registers and limited instructions. This is usually where optimizing for size has more of a penalty, some levels showing huge differences.

But it seams to me like the faster computer hardware gets the less programmers care about optimizing anything since "it runs fast enough anyway". Hence why the windows calculator calc.exe now uses about 20 to 30MB of RAM to run.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #99 on: November 14, 2019, 08:09:14 am »
I did some tests with Bruce's code, and I confirm that with his code I also get the same execution time with -O2 and -O3 on my Core i7: 2490 ms. Now I tried with -Ofast, and it's actually slightly slower (which is consistent with my previous benchmarks with -Ofast which I've found often slower than -O3 actually), with 2550 ms. This is not that surprising, as execution time depends on many factors including how code and data are cached.

On an i7-8650U (which I have two of .. a NUC and a ThinkPad X1 Carbon) it's actually faster with -O1 (2735ms) than with -O2 or -O3 (3428ms) !!

On my i7-5930K, I get: 2490 ms for -O2 and -O3, but 2605 ms for -O1... (GCC 9.2.0 here if that matters.)

At this level semi-random things such as how code (especially branch targets) happen to fall in cache lines makes a big difference. And ASLR makes it vary from run to run.

I think we can all agree that modern x86 is stupidly fast, and even the slowest microcontrollers are amazing.

I'm trying to estimate how long the university VAX I learned to program on would take to run this. I think it'll be around 24 hours.

It would be interesting to try an ATmega2560. I think it will *just* fit without modification. I'm using 8000 bytes (plus a handfull) of global memory, which is less than 8 KB. But I've only got 328s here.

It would fit on an Apple ][ or C64 or Atari XL. But no one has C compilers for them. C compilers for Z80 suck but at least they exist. Anyone have a working Speccy or Amstrad CPC or something?

It'll probably take a week to run.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #100 on: November 14, 2019, 08:16:16 am »
The .pdf (*) says (page 2) "Tightly coupled GPIOs, operating at the same frequency as Arm".

I was hoping to see ~ 1/2 the cpu clock, but it's only 1/4th:

Code: [Select]
#define PIN 13

void setup () { pinMode(PIN, OUTPUT); }

void loop () {
  while (1) {
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
  }
}

Gives 150MHz @600MHz cpu clock. What am I doing wrong?
« Last Edit: November 14, 2019, 02:50:16 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #101 on: November 14, 2019, 08:21:29 am »
Dammit .. I obviously *need* an Arduino mega 2560. Ordered a clone for $15.

He who dies with the most toys wins.
« Last Edit: November 14, 2019, 08:23:04 am by brucehoult »
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Raspberry Pi 4
« Reply #102 on: November 14, 2019, 08:45:51 am »
Unroll the loop a bit?
Code: [Select]
void loop () {
  while (1) {
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
  }
}
Email: OwOwOwOwO123@outlook.com
 
The following users thanked this post: GeorgeOfTheJungle

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #103 on: November 14, 2019, 08:59:13 am »
Unroll the loop a bit?
Code: [Select]
void loop () {
  while (1) {
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
  }
}

That's what I tried first, but does nothing. It looks as if the gpio bus clock was 1/2 the cpu clock? But that's not what the .pdf says.
« Last Edit: November 14, 2019, 02:37:08 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Raspberry Pi 4
« Reply #104 on: November 14, 2019, 01:42:58 pm »
Zynq-7010 (Cortex A9) @ 650MHz:
Code: [Select]
$ gcc primes.c -o primes -O3
$ time ./primes
Starting run
3713160 primes found in 39728 ms
-396 bytes of code in countPrimes()

real    0m39.745s
user    0m39.723s
sys     0m0.011s
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
650000

i7-4700MQ @ 2.40GHz:
Code: [Select]
$ clang-7 primes.c -o primes -Ofast
$ time ./primes
Starting run
3713160 primes found in 4707 ms
256 bytes of code in countPrimes()

real 0m4.708s
user 0m4.708s
sys 0m0.000s
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
2400000
$ grep MHz /proc/cpuinfo
cpu MHz : 2394.522
cpu MHz : 2395.176
cpu MHz : 2394.455
cpu MHz : 2394.955
cpu MHz : 2394.427
cpu MHz : 2394.608
cpu MHz : 2394.432
cpu MHz : 2394.821
Email: OwOwOwOwO123@outlook.com
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Raspberry Pi 4
« Reply #105 on: November 14, 2019, 02:34:10 pm »
Yes, that's the first thing I tried, but doesn't work. It looks as if the gpio bus clock is 1/2 the cpu clock? But that's not what the .pdf says...
Can you look at the disassembly and see if the accesses are single instruction? Also I see GPIO is on a AHB bus and behind an adapter of some sort ("AIPS-Lite"). Maybe the core waits for the write response when accessing uncached memory? I would imagine accessing peripheral areas also stalls the entire processor until the access is acknowledged...
Email: OwOwOwOwO123@outlook.com
 
The following users thanked this post: GeorgeOfTheJungle

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #106 on: November 14, 2019, 02:45:44 pm »
Also I see GPIO is on a AHB bus and behind an adapter of some sort ("AIPS-Lite"). Maybe the core waits for the write response when accessing uncached memory? I would imagine accessing peripheral areas also stalls the entire processor until the access is acknowledged...

Maybe, but then, why do they say "Tightly coupled GPIOs, operating at the same frequency as Arm" in the pdf? What I'm seeing here is similar to what the STM32s do, where the gpio bus runs at a different speed from a different clock than the CPU.

Quote
Can you look at the disassembly and see if the accesses are single instruction?

I'm using the arduino IDE, I don't know how to do that with this.

Edit:
There are 5 "gpio modules" GPIO1..5, maybe they're not all equal? Pin 13 (what I'm using) is on GPIO2, perhaps some other port is faster?
« Last Edit: November 14, 2019, 02:56:42 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: Raspberry Pi 4
« Reply #107 on: November 14, 2019, 03:53:02 pm »
Operating frequency of the GPIO peripheral doesn't mean shit. If it takes N cycles to get a write request through the interconnect to the peripheral and the write response back, then that's N cycles the processor can NOT DO ANYTHING because it's required to guarantee order of accesses (this is IO memory and not cached memory). I think 2 cycles for a GPIO write is already very good. If you want it down to 1 cycle the GPIO controller must be integrated into the CPU itself.
Email: OwOwOwOwO123@outlook.com
 
The following users thanked this post: GeorgeOfTheJungle

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Raspberry Pi 4
« Reply #108 on: November 14, 2019, 03:57:08 pm »
Unroll the loop a bit?
Code: [Select]
void loop () {
  while (1) {
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTSET = CORE_PIN13_BITMASK;
    CORE_PIN13_PORTCLEAR = CORE_PIN13_BITMASK;
  }
}

That's what I tried first, but does nothing. It looks as if the gpio bus clock was 1/2 the cpu clock? But that's not what the .pdf says.
On most simple machines I would expect what you get, if the GPIOs are on the full speed bus. One cycle to get the set instruction. One cycle to write to the GPIO, One cycle to get the clear instruction. One cycle to write to the GPIO, Rinse and repeat.
« Last Edit: November 14, 2019, 03:59:01 pm by coppice »
 
The following users thanked this post: GeorgeOfTheJungle

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: Raspberry Pi 4
« Reply #109 on: November 14, 2019, 04:25:15 pm »
I did some tests with Bruce's code, and I confirm that with his code I also get the same execution time with -O2 and -O3 on my Core i7: 2490 ms. Now I tried with -Ofast, and it's actually slightly slower (which is consistent with my previous benchmarks with -Ofast which I've found often slower than -O3 actually), with 2550 ms. This is not that surprising, as execution time depends on many factors including how code and data are cached.

On an i7-8650U (which I have two of .. a NUC and a ThinkPad X1 Carbon) it's actually faster with -O1 (2735ms) than with -O2 or -O3 (3428ms) !!

On my i7-5930K, I get: 2490 ms for -O2 and -O3, but 2605 ms for -O1... (GCC 9.2.0 here if that matters.)

At this level semi-random things such as how code (especially branch targets) happen to fall in cache lines makes a big difference. And ASLR makes it vary from run to run.

Yup. Also 1/ not all GCC back-ends are born equal, some issue much better code for their given target than others, 2/ even when using "similar" targets (Core-i7 here), there can be huge difference running the exact same object code. The i7-5930K (even though now a bit old) is still a power horse, and it supports quad-channel RAM, and probably a lot more cache than the typical CPUs used on laptops (I'd have to check with yours.) I compiled it as 64-bit executables if that makes a difference, don't know if you did or if you only tested 32-bit builds.

I also checked with my laptop, which has a (relatively old) i7-2600M, and I get about twice the execution time, but still -O1 is slower than -O2 or -O3 on it, although on laptops (on mine for sure), you are likely using some kind of "on-demand" frequency governor, so you never really get the top performance, and performance can vary according on many more factors than on systems running at a fixed frequency... (so at -O1, on my laptop, the fun fact is that execution times between runs seemed to have much more variation than with -O3.)

It would fit on an Apple ][ or C64 or Atari XL. But no one has C compilers for them. C compilers for Z80 suck but at least they exist. Anyone have a working Speccy or Amstrad CPC or something?

C compilers for Z80 do exist yeah. I remember people writing Gameboy apps in C for instance. Dunno if the compiler sucked, but it sure seemed to work fine.

All I have is a Sinclair QL (68008), there are C compilers but have never used any. Don't really feel like firing it up again and fiddle with this at the moment, but it should be doable.

Wouldn't one of the emulators for the Spectrum or CPC be fine for this? (I guess there are some emulators that should be relatively accurate timing wise?)
« Last Edit: November 14, 2019, 04:30:46 pm by SiliconWizard »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #110 on: November 14, 2019, 05:23:06 pm »
Quote
Can you look at the disassembly and see if the accesses are single instruction?

I'm using the arduino IDE, I don't know how to do that with this.

The Arduino IDE actually makes it relatively simple to do this. Make a trivial change to your source code -- maybe add and delete a character and then hit the "check"/"compile" button. IN the panel at the bottom (maybe make it bigger) you'll see a line like the following, with the first (very long) "word" ending in gcc and the path to your eventual executable binary file in the middle (here Blink.ino.elf) after a "-o". Or for certain targets the gcc might be ld instead.

/home/bruce/software/arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-gcc -O1 -Wl,--gc-sections,--relax -T/home/bruce/software/arduino-1.8.10/hardware/teensy/avr/cores/teensy4/imxrt1062.ld -mthumb -mcpu=cortex-m7 -mfloat-abi=hard -mfpu=fpv5-d16 -o /tmp/arduino_build_829669/Blink.ino.elf /tmp/arduino_build_829669/sketch/Blink.ino.cpp.o /tmp/arduino_build_829669/core/core.a -L/tmp/arduino_build_829669 -larm_cortexM7lfsp_math -lm -lstdc++

Open a terminal window (from your OS, nothing to do with gcc) and copy and paste the bit with gcc or ld and the output file. Don't try to run it yet!

/home/bruce/software/arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-gcc  /tmp/arduino_build_829669/Blink.ino.elf

Now just replace the "gcc" bit by "objdump -d":

/home/bruce/software/arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-objdump -d  /tmp/arduino_build_829669/Blink.ino.elf

You can run that.

If you can't scroll your terminal window backwards then you might want to put " | more" (or " | less") on the end, or redirect the output to a file with " >/home/bruce/myDisassembly.txt" or whatever other location or name you want. (Your name probably isn't Bruce...)

If the compiler is gcc then you can get an assembly language listing by instead finding the line that compiled your code ("Blink.ino.cpp") to an object file ("-o .../Blink.ino.cpp.o"). You can just copy and paste the whole line into your console/terminal window and re-run it. If you add to the end " -g -Wa,-adhl" then you'll get a listing printed to the terminal with the original lines of C code, the generated assembly language, and the binary (hex) code for the instructions.
 
The following users thanked this post: GeorgeOfTheJungle

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: Raspberry Pi 4
« Reply #111 on: November 14, 2019, 06:17:21 pm »
The .pdf (*) says (page 2) "Tightly coupled GPIOs, operating at the same frequency as Arm".
I was hoping to see ~ 1/2 the cpu clock, but it's only 1/4th:
Here is something on that:
https://community.nxp.com/docs/DOC-342954
 
The following users thanked this post: GeorgeOfTheJungle

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #112 on: November 14, 2019, 06:30:03 pm »
It seems that the writes take two cycles :-(

Code: [Select]
#define PIN 13
volatile unsigned int* _set= (volatile unsigned int*) 0x42004084;
volatile unsigned int* _clr= (volatile unsigned int*) 0x42004088;
volatile unsigned int* _flip= (volatile unsigned int*) 0x4200408c;

void setup () { pinMode(PIN, OUTPUT); }
void loop () {
  while (1) { *_flip= 0x8; *_flip= 0x8; *_flip= 0x8; *_flip= 0x8; }
}
« Last Edit: November 25, 2019, 10:44:51 am by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #113 on: November 14, 2019, 06:35:31 pm »
The .pdf (*) says (page 2) "Tightly coupled GPIOs, operating at the same frequency as Arm".
I was hoping to see ~ 1/2 the cpu clock, but it's only 1/4th:
Here is something on that:
https://community.nxp.com/docs/DOC-342954

 :-+

Quote
RT1060 provides two set of GPIOs registers to control pads output. GPIO1 to GPIO3 is general GPIO, and GPIO6 to GPIO8 is tightly GPIO, but they share the same pad, that means the gpio pin can select from GPIO1/2/3 to GPIO6/7/8.

Then there's still hope! Because I'm using GPIO2.
« Last Edit: November 14, 2019, 06:43:28 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: Raspberry Pi 4
« Reply #114 on: November 14, 2019, 06:36:52 pm »
It would fit on an Apple ][ or C64 or Atari XL. But no one has C compilers for them.
C compilers for the Apple ][ existed. I used one of them. If they existed for the Apple ][, I'm sure they existed for the C64 as well.
C compilers for Z80 suck but at least they exist. Anyone have a working Speccy or Amstrad CPC or something?
C compilers for the Z80 were fine, but most C compilers used for Z80s were actually 8080 compilers, and the restricted instruction set they spewed out certainly hampered performance. Nonetheless, huge amounts of widely used CP/M and embedded Z80 code were developed in C, and ran very well.
 

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #115 on: November 14, 2019, 06:37:54 pm »
Quote
When using the register DR_TOGGLE and the fast GPIO we will get the best performance of the pin.
From the above link.. Did you try it?
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #116 on: November 14, 2019, 06:40:39 pm »
Quote
When using the register DR_TOGGLE and the fast GPIO we will get the best performance of the pin.
From the above link.. Did you try it?

Yes:
Code: [Select]
volatile unsigned int* _flip= (volatile unsigned int*) 0x4200408c;
But I'm using the wrong GPIO group it seems.
The further a society drifts from truth, the more it will hate those who speak it.
 

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #117 on: November 14, 2019, 06:46:55 pm »
My bet with fast GPIO group and unrolled loop you get 300MHz toggle :)
 
The following users thanked this post: GeorgeOfTheJungle

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: Raspberry Pi 4
« Reply #118 on: November 14, 2019, 06:57:03 pm »
Dammit .. I obviously *need* an Arduino mega 2560. Ordered a clone for $15.

He who dies with the most toys wins.

If that doesn't work I can always try with some protos that didn't work out. 2560's with 256k RAM. Would be slightly slower than all on chip but they can run 16 or 20 MHz.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #119 on: November 14, 2019, 07:18:18 pm »
My bet with fast GPIO group and unrolled loop you get 300MHz toggle :)

Bad news... I'm using GPIO7 already :-(



https://github.com/PaulStoffregen/cores/blob/master/teensy4/imxrt.h#L5039-L5050

Code: [Select]
volatile unsigned int* _flip= (volatile unsigned int*) 0x4200408c;
« Last Edit: November 14, 2019, 07:45:00 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6261
  • Country: fi
    • My home page and email address
Re: Raspberry Pi 4
« Reply #120 on: November 14, 2019, 08:14:10 pm »
Can the MCU on the Teensy 4.0 board really run reliably @1GHz? That's pretty impressive. Is it not getting too hot?
You do need a heatsink on the i.MX chip.

I was hoping to see ~ 1/2 the cpu clock, but it's only 1/4th:
Each bus load/store does take two clocks, because the armv7-m thumb2 load and store (single) instructions take two cycles each, so 1/4th CPU clock on the output pin is expected.

(The Processor Instruction Timings section in the Cortex-M3 Technical Manual gives 2 clocks per load/store instruction, with a footnote saying that "Generally, load-store instructions take two cycles for the first access [and one cycle for each additional access]", so I believe the two cycles per I/O load/store is expected.  Note: while i.MX RT1060 is a Cortex-M7, Cortex-M7 instruction timings are not published.  So, I went with this suggestion.)

Teensy 4.0 does have the processor cycle counter enabled and directly accessible as the ARM_DWT_CYCCNT macro, BTW.
« Last Edit: November 14, 2019, 08:24:11 pm by Nominal Animal »
 
The following users thanked this post: GeorgeOfTheJungle

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #121 on: November 14, 2019, 09:40:41 pm »
Yep, it looks the 150Mhz is the max toggling freq, a pity..
https://www.nxp.com/docs/en/application-note/AN12240.pdf
 
The following users thanked this post: GeorgeOfTheJungle

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #122 on: November 14, 2019, 10:17:27 pm »
What does the 150MHz toggling look like on a scope? Is it pretty clean? I still haven't thought of a reason to buy a teensy 4. Waiting on the 3.6 form factor with sd slot.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6261
  • Country: fi
    • My home page and email address
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #123 on: November 15, 2019, 01:02:51 am »
Yep, it looks the 150Mhz is the max toggling freq
You should be able to reach 240 MHz if you overclock to 960MHz, although you need a heatsink for the i.MX RT1060 chip.  :P

What does the 150MHz toggling look like on a scope?
I wish I had a scope with that kind of bandwidth!

I don't think I have seen any scope screenshots of that at the PJRC forum, but based on the discussions around PWM, timers, SPI/I2C I/O, especially in the looong beta test thread, I think one needs to go an order of magnitude lower in the frequency to generate controllable/useful outputs.  Just my opinion, though.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #124 on: November 15, 2019, 07:11:47 am »
What does the 150MHz toggling look like on a scope? Is it pretty clean? I still haven't thought of a reason to buy a teensy 4. Waiting on the 3.6 form factor with sd slot.

This is with a Micsig TO1074, 130MHz probe, Teensy @960MHz. There's an interrupt that kicks in every 1ms.

872212-0

Agilent DSO7104, 1165A probe, Teensy @1008MHz:

872310-1

Teensy @600MHz:

872298-2

The interrupt:

872304-3
« Last Edit: November 15, 2019, 11:17:30 am by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 
The following users thanked this post: iMo, Nominal Animal

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #125 on: November 15, 2019, 07:51:55 am »
Can you do sample and hold with peak detect?
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #126 on: November 15, 2019, 01:32:58 pm »
The best we could have done 41 years ago:

Code: [Select]
* = $1000
1000 CLRAN0 = C058
1000 SETAN0 = C059

1000        CLC             18
1001 LOOP   
1001        LDA SETAN0      AD 59 C0    //4 cycles
1004        LDA CLRAN0      AD 58 C0    //4 cycles
1007        BCC LOOP        90 F8       //3 cycles -> 11

1009 .END
done.

On an $666 Apple II @1MHz => 1/11e-6 = 90.9 kHz = 0.09 MHz. Today 2640x times better at 1/32th the price.  :-+
« Last Edit: November 25, 2019, 12:34:50 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #127 on: November 15, 2019, 03:44:27 pm »
The best we could have done 41 years ago:

Code: [Select]
* = $1000
1000 CLRAN0 = C058
1000 SETAN0 = C059

1000        CLC             18
1001 LOOP   
1001        LDA CLRAN0      AD 58 C0    //4 cycles
1004        LDA SETAN0      AD 59 C0    //4 cycles
1007        BCC LOOP        90 F8       //3 cycles -> 11

1009 .END
done.

On an $666 Apple II @1MHz => 1/11e-6 = 90.9 kHz. Today 2640x times better at 1/32th the price.  :-+

Ahah, yes. I think we often don't even realize how "lucky" we are these days. Too bad that a lot of the extra performance is just wasted because we're lazy and it's become cheap.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #128 on: November 15, 2019, 06:56:41 pm »
Ahah, yes. I think we often don't even realize how "lucky" we are these days. Too bad that a lot of the extra performance is just wasted because we're lazy and it's become cheap.
I've seen attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhäuser Gate... :) but I had never seen a pin bit banged @250 MHz !!
The further a society drifts from truth, the more it will hate those who speak it.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #129 on: November 15, 2019, 07:01:12 pm »
Blade Runner? ;D
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6261
  • Country: fi
    • My home page and email address
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #130 on: November 15, 2019, 07:11:10 pm »
There's an interrupt that kicks in every 1ms.
That is the systick_isr interrupt that delay(), millis(), and micros() need.  If you don't use those three functions, and use the cycle counter instead, then you can disable the interrupt via SYST_CSR=0;. To re-enable, use SYST_CSR=SYST_CSR_TICKINT|SYST_CSR_ENABLE;.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: Raspberry Pi 4
« Reply #131 on: November 15, 2019, 07:40:22 pm »
It would fit on an Apple ][ or C64 or Atari XL. But no one has C compilers for them. C compilers for Z80 suck but at least they exist. Anyone have a working Speccy or Amstrad CPC or something?

It'll probably take a week to run.

My Apple II is still operative :-) , not that I'm exactly willing to program that in 6502 assembly LOL. If the bit banging experiment result serves as indication (which it doesn't), it should take 43516*(150e6/90,9e3)/1e3/60/60= "only" 20 hours.

Millennials, take note: there are things that were much worse when we were young  >:D
« Last Edit: November 15, 2019, 07:50:22 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #132 on: November 15, 2019, 07:44:14 pm »
There's an interrupt that kicks in every 1ms.
That is the systick_isr interrupt that delay(), millis(), and micros() need.  If you don't use those three functions, and use the cycle counter instead, then you can disable the interrupt via SYST_CSR=0;. To re-enable, use SYST_CSR=SYST_CSR_TICKINT|SYST_CSR_ENABLE;.

It only lasts 100 ns, but, good to know! Thanks.
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #133 on: November 15, 2019, 08:35:14 pm »
It would fit on an Apple ][ or C64 or Atari XL. But no one has C compilers for them. C compilers for Z80 suck but at least they exist. Anyone have a working Speccy or Amstrad CPC or something?

It'll probably take a week to run.

My Apple II is still operative :-) , not that I'm exactly willing to program that in 6502 assembly LOL. If the bit banging experiment result serves as indication (which it doesn't), it should take 43516*(150e6/90,9e3)/1e3/60/60= "only" 20 hours.

I'd code it up for you to try.

But not today :-)
 

Offline techman-001

  • Frequent Contributor
  • **
  • !
  • Posts: 748
  • Country: au
  • Electronics technician for the last 50 years
    • Mecrisp Stellaris Unofficial UserDoc
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #134 on: November 15, 2019, 08:47:31 pm »
Ahah, yes. I think we often don't even realize how "lucky" we are these days. Too bad that a lot of the extra performance is just wasted because we're lazy and it's become cheap.
I've seen attack ships on fire off the shoulder of Orion. I watched C-beams glitter in the dark near the Tannhäuser Gate... :) but I had never seen a pin bit banged @250 MHz !!

Morons, they could make awesome artificial people, but used self oxidizing metals in their attack ships ... and what's with these "c-beams" don't people know that only "forth-beams" are approved for use in outer space ?
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #135 on: November 19, 2019, 08:24:37 pm »
So, I tried my primes test on an ATmega2560. It was a bit hacky. It has 8192 bytes of SRAM. My code uses 8004 bytes of global variables if I change the "int"s to "long"s (which I have to because the numbers being used get as high as 62,710,561 but "int" means 16 bit on avr-gcc). That's tight. Too tight. The first compile says I need 8254 bytes for global variables. Knowing that AVR copies initialized constants from flash to RAM on start up I removed the printing of all the string literals. OMG. Now it's *exactly* 8192 bytes.

Code: [Select]
Global variables use 8192 bytes (100%) of dynamic memory, leaving 0 bytes for local variables. Maximum is 8192 bytes.
Low memory available, stability problems may occur.

Ugh. That means as soon as I run main() the stack will be overwriting some global variables. I don't know if it will be my own arrays and "nSieve" or some library's globals. Best case, maybe it's the end of a buffer used by Serial.

I try replacing "if (nSieve < SZ){" with 10 instead of SZ and run it. No drama! The correct answer is printed. I try 100 instead of 10. Still get the correct answer and no crash. OK . let's try the full thing overnight!

After 12249318 ms (3.4 hours) the correct number of primes, 3713160, is printed.

My apologies to whoever's globals my stack clobbered. But it worked. It wouldn't have worked if it was one of my arrays that got clobbered.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #136 on: November 19, 2019, 08:32:49 pm »
Zynq-7010 (Cortex A9) @ 650MHz:
Code: [Select]
$ gcc primes.c -o primes -O3
$ time ./primes
Starting run
3713160 primes found in 39728 ms
-396 bytes of code in countPrimes()

real    0m39.745s
user    0m39.723s
sys     0m0.011s
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
650000

i7-4700MQ @ 2.40GHz:
Code: [Select]
$ clang-7 primes.c -o primes -Ofast
$ time ./primes
Starting run
3713160 primes found in 4707 ms
256 bytes of code in countPrimes()

real 0m4.708s
user 0m4.708s
sys 0m0.000s
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
2400000
$ grep MHz /proc/cpuinfo
cpu MHz : 2394.522
cpu MHz : 2395.176
cpu MHz : 2394.455
cpu MHz : 2394.955
cpu MHz : 2394.427
cpu MHz : 2394.608
cpu MHz : 2394.432
cpu MHz : 2394.821

Hi, thanks for the data! The Zynq is especially interesting.

However, to keep things fair between different machines I'm only including results that use the same optimization level on everything, and that is -O1.

If you redo the test with -O1 then I'll include your results in my list.

Thanks!
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #137 on: November 20, 2019, 02:00:32 am »
So, I tried my primes test on an ATmega2560. It was a bit hacky. It has 8192 bytes of SRAM. My code uses 8004 bytes of global variables if I change the "int"s to "long"s (which I have to because the numbers being used get as high as 62,710,561 but "int" means 16 bit on avr-gcc). That's tight. Too tight. The first compile says I need 8254 bytes for global variables. Knowing that AVR copies initialized constants from flash to RAM on start up I removed the printing of all the string literals. OMG. Now it's *exactly* 8192 bytes.

Code: [Select]
Global variables use 8192 bytes (100%) of dynamic memory, leaving 0 bytes for local variables. Maximum is 8192 bytes.
Low memory available, stability problems may occur.

Ugh. That means as soon as I run main() the stack will be overwriting some global variables. I don't know if it will be my own arrays and "nSieve" or some library's globals. Best case, maybe it's the end of a buffer used by Serial.

I try replacing "if (nSieve < SZ){" with 10 instead of SZ and run it. No drama! The correct answer is printed. I try 100 instead of 10. Still get the correct answer and no crash. OK . let's try the full thing overnight!

After 12249318 ms (3.4 hours) the correct number of primes, 3713160, is printed.

My apologies to whoever's globals my stack clobbered. But it worked. It wouldn't have worked if it was one of my arrays that got clobbered.

Was this with printf? I just compiled and started running it on my board(20MHz) and it only ended up as 8082 bytes. I only use an LCD but I'm running the code now at -O1 and exactly as you originally wrote it.

Code: [Select]
Sketch uses 2832 bytes (1%) of program storage space. Maximum is 262144 bytes.
Global variables use 8082 bytes (12%) of dynamic memory, leaving 57453 bytes for local variables. Maximum is 65535 bytes.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #138 on: November 20, 2019, 02:09:50 am »
That's using Serial.println(long) in an Arduino sketch. An attached LCD is probably ideal for minimal code. Or printing to the UART with minimal abstraction overhead in between, but I'm not set up for that at the moment..
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #139 on: November 20, 2019, 02:20:55 am »
Ah, alternatively I'm not set up for anything else. I don't have any 2560 boards with UARTs or USB connections. I'll post results when done, see if cycles(roughly) match up with yours.
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #140 on: November 20, 2019, 05:57:27 am »
i7-4700MQ @ 2.40GHz:

Scripted it this time. Also killed most background processes.
Code: [Select]
gcc -O1: 3713160 primes found in 5253 ms; 258 bytes of code in countPrimes()
gcc -O2: 3713160 primes found in 4806 ms; -400 bytes of code in countPrimes()
gcc -O3: 3713160 primes found in 4767 ms; -400 bytes of code in countPrimes()
gcc -Ofast: 3713160 primes found in 4832 ms; -432 bytes of code in countPrimes()
clang-7 -O1: 3713160 primes found in 4452 ms; 240 bytes of code in countPrimes()
clang-7 -O2: 3713160 primes found in 4434 ms; 256 bytes of code in countPrimes()
clang-7 -O3: 3713160 primes found in 4442 ms; 256 bytes of code in countPrimes()
clang-7 -Ofast: 3713160 primes found in 4429 ms; 256 bytes of code in countPrimes()

Zynq-7010 @ 650MHz:
Code: [Select]
gcc -O1: 3713160 primes found in 48206 ms; 248 bytes of code in countPrimes()
gcc -O2: 3713160 primes found in 39535 ms; -420 bytes of code in countPrimes()
gcc -O3: 3713160 primes found in 39536 ms; -420 bytes of code in countPrimes()
gcc -Ofast: 3713160 primes found in 39855 ms; -428 bytes of code in countPrimes()
clang-7 -O1: 3713160 primes found in 49687 ms; 232 bytes of code in countPrimes()
clang-7 -O2: 3713160 primes found in 45148 ms; 216 bytes of code in countPrimes()
clang-7 -O3: 3713160 primes found in 45152 ms; 216 bytes of code in countPrimes()
clang-7 -Ofast: 3713160 primes found in 45368 ms; 216 bytes of code in countPrimes()

Looks like gcc at O1 misses quite a few optimizations, but unfortunately clang on ARM seems broken because it can't match gcc even at -Ofast. gcc version is 8.3.0 on both systems.

Script:
Code: [Select]
#!/bin/bash

# warmup

echo "warmup..."
for i in {1..3}; do
gcc primes.c -o primes -O1 && ./primes 2>/dev/null
done

for CC in gcc clang-7; do
for OPTLEVEL in O1 O2 O3 Ofast; do
echo -n "$CC -$OPTLEVEL: "
$CC primes.c -o primes -"$OPTLEVEL" && ./primes 2>/dev/null
done;
done;

I would suggest maybe using a similar script like the above and then pick the fastest result for every platform, because the optimization levels probably don't translate well across platforms (there are CPU specific optimizations). You can see here that the gap between O1 and O2 is much wider on the Zynq than the i7. Maybe also even -march=native.
« Last Edit: November 20, 2019, 06:04:04 am by OwO »
Email: OwOwOwOwO123@outlook.com
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #141 on: November 20, 2019, 06:24:35 am »
So, I tried my primes test on an ATmega2560. It was a bit hacky. It has 8192 bytes of SRAM. My code uses 8004 bytes of global variables if I change the "int"s to "long"s (which I have to because the numbers being used get as high as 62,710,561 but "int" means 16 bit on avr-gcc). That's tight. Too tight. The first compile says I need 8254 bytes for global variables. Knowing that AVR copies initialized constants from flash to RAM on start up I removed the printing of all the string literals. OMG. Now it's *exactly* 8192 bytes.

Code: [Select]
Global variables use 8192 bytes (100%) of dynamic memory, leaving 0 bytes for local variables. Maximum is 8192 bytes.
Low memory available, stability problems may occur.

Ugh. That means as soon as I run main() the stack will be overwriting some global variables. I don't know if it will be my own arrays and "nSieve" or some library's globals. Best case, maybe it's the end of a buffer used by Serial.

I try replacing "if (nSieve < SZ){" with 10 instead of SZ and run it. No drama! The correct answer is printed. I try 100 instead of 10. Still get the correct answer and no crash. OK . let's try the full thing overnight!

After 12249318 ms (3.4 hours) the correct number of primes, 3713160, is printed.

My apologies to whoever's globals my stack clobbered. But it worked. It wouldn't have worked if it was one of my arrays that got clobbered.

Here is my result with a 20MHz 2560 using exact code from your .txt. Only change was int to int32_t for
the arrays, nSieve, nPrimes, trial, and sqr. 3.73597588 hours, so adjusting SZ to some other number is definitely a shortcut.
« Last Edit: November 20, 2019, 06:31:00 am by maginnovision »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #142 on: November 20, 2019, 06:41:00 am »
I would suggest maybe using a similar script like the above and then pick the fastest result for every platform, because the optimization levels probably don't translate well across platforms (there are CPU specific optimizations). You can see here that the gap between O1 and O2 is much wider on the Zynq than the i7. Maybe also even -march=native.

I don't have access to most of those machines now, so changing the methodology would not make sense.

I've added your results. I've put the 4700MQ at its advertised 3.4 GHz turbo speed which places it a bit better than 3rd generation 3770 in clock cycles. 2.4 GHz would not make any sense.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #143 on: November 20, 2019, 06:47:02 am »
After 12249318 ms (3.4 hours) the correct number of primes, 3713160, is printed.

Here is my result with a 20MHz 2560 using exact code from your .txt. Only change was int to int32_t for
the arrays, nSieve, nPrimes, trial, and sqr. 3.73597588 hours, so adjusting SZ to some other number is definitely a shortcut.

Wow interesting yours took 9.8% longer despite having 25% faster clock. Did you say that's on a 2560 with more RAM than standard, but slower?

You can't adjust SZ to some other value and get results that are in any way comparable. The execution time is I think something close to cubic in SZ. This test needs 8000 bytes plus a bit, or out of luck.

We have almost a 5000:1 speed range in the results! Should be near 6000:1 with a current generation x86 that turbos to 5.0 GHz.
« Last Edit: November 20, 2019, 06:55:15 am by brucehoult »
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #144 on: November 20, 2019, 07:03:35 am »
After 12249318 ms (3.4 hours) the correct number of primes, 3713160, is printed.

Here is my result with a 20MHz 2560 using exact code from your .txt. Only change was int to int32_t for
the arrays, nSieve, nPrimes, trial, and sqr. 3.73597588 hours, so adjusting SZ to some other number is definitely a shortcut.

Wow interesting yours took 9.8% longer despite having 25% faster clock. Did you say that's on a 2560 with more RAM than standard, but slower?

You can't adjust SZ to some other value and get results that are in any way comparable. The execution time is I think something close to cubic in SZ. This test needs 8000 bytes plus a bit, or out of luck.

We have almost a 5000:1 speed range in the results! Should be near 6000:1 with a current generation x86 that turbos to 5.0 GHz.

Since the extended ram isn't used it'll be running full speed. Total RAM usage was reported as 8082 bytes.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #145 on: November 20, 2019, 07:15:05 am »
Since the extended ram isn't used it'll be running full speed. Total RAM usage was reported as 8082 bytes.

Ahhh .. I see now the Arduino environment is running avr-gcc with -Os. No menu to change that and I can't be bothered to dig in the config files to change it right now. That's probably enough to do it.

I'll just take your result as the authoritative one then :-)
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #146 on: November 20, 2019, 07:18:48 am »
Since the extended ram isn't used it'll be running full speed. Total RAM usage was reported as 8082 bytes.

Ahhh .. I see now the Arduino environment is running avr-gcc with -Os. No menu to change that and I can't be bothered to dig in the config files to change it right now. That's probably enough to do it.

I'll just take your result as the authoritative one then :-)

I did change it to -O1. I'm re-running now without extended ram enabled just to make sure.

Also on my I7-7700K with stock clocks(4.2GHz):

Code: [Select]
Starting run
3713160 primes found in 2453ms
242 bytes of code in countPrimes()

EDIT: Updated after using WSL to recompile and run.
« Last Edit: November 20, 2019, 07:59:52 am by maginnovision »
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #147 on: November 20, 2019, 07:36:09 am »
Ahhh .. I see now the Arduino environment is running avr-gcc with -Os. No menu to change that and I can't be bothered to dig in the config files to change it right now. That's probably enough to do it.

Find and replace -Os in the file platform.txt
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #148 on: November 20, 2019, 07:42:40 am »
Since the extended ram isn't used it'll be running full speed. Total RAM usage was reported as 8082 bytes.

Ahhh .. I see now the Arduino environment is running avr-gcc with -Os. No menu to change that and I can't be bothered to dig in the config files to change it right now. That's probably enough to do it.

I'll just take your result as the authoritative one then :-)

I did change it to -O1. I'm re-running now without extended ram enabled just to make sure.

Yes, I know (or anyway assumed) you did. It's me that accidentally used -Os and whatever does that is buried deep in some script I don't want to find right now.

If you're re-running it, one thing you could change is it's ok for nSieve to be 16 bits. And i as well, but I think you left that one as "int"? It will help a little bit.

Poor little AVR. Though it'll thrash the heck out of 6502 or Z80 or VAX 780 or even probably 68000 or 286. 68020 and 386 would probably be a close-run thing against AVR :-)
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #149 on: November 20, 2019, 08:06:40 am »
The i7-4700MQ result was actually ran at 2.4GHz. Here is the result at 3.4GHz:
Code: [Select]
$ gcc primes.c -o primes -O1
$ time ./primes
Starting run
3713160 primes found in 3841 ms; 258 bytes of code in countPrimes()

real 0m3.848s
user 0m3.843s
sys 0m0.001s
$ time ./primes
Starting run
3713160 primes found in 3836 ms; 258 bytes of code in countPrimes()

real 0m3.843s
user 0m3.827s
sys 0m0.011s

That would put it at around 13 billion cycles.
But the full suite of tests is more difficult to run because it runs into thermal throttling issues.
« Last Edit: November 20, 2019, 08:09:35 am by OwO »
Email: OwOwOwOwO123@outlook.com
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #150 on: November 20, 2019, 08:20:57 am »
Ok, it's re running with nSeive as an int. i for the loop was always an int.

Also I checked my xCore200 board with 5 threads, the max I can do pretty quickly... 113655.671  milliseconds(+/- 20ns). Little slower than the XS1 board running 4 threads but it's not that surprising since it ends up just waiting on the final thread but the XS1 gets 125MHz threads. The XS1 would probably be even faster running 8 threads since the earlier ones would finish and move the processing time along to later threads.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #151 on: November 20, 2019, 08:37:15 am »
Hmm .. ok. That puts it closer to Skylake than I expected, but ok.

If a 47W laptop part is throttling then my 15W TDP 8650U is probably throttling even more :-) But it's damn quick. I don't normally observe it throttling in a 3 second single core test. *Maybe* it's going from 4.2 to 3.9. But that makes it even fewer than 11.5b clocks .. that would be 10.7. It takes (in a NUC) about 20 seconds of all-core work to drop back to 3.4, and even after 20 minutes it's still at 2.8. The same 8650U CPU in a ThinkPad drops back much more quickly, and gets to 2.2 at the end of the same workload.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #152 on: November 20, 2019, 08:45:00 am »
Also I checked my xCore200 board with 5 threads, the max I can do pretty quickly... 113655.671  milliseconds(+/- 20ns). Little slower than the XS1 board running 4 threads but it's not that surprising since it ends up just waiting on the final thread but the XS1 gets 125MHz threads. The XS1 would probably be even faster running 8 threads since the earlier ones would finish and move the processing time along to later threads.

I'm not sure I understand how fine level threading will work on an xCore for this code. Does it get the correct number of primes? Are the arrays duplicated?
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #153 on: November 20, 2019, 08:54:08 am »
Also I checked my xCore200 board with 5 threads, the max I can do pretty quickly... 113655.671  milliseconds(+/- 20ns). Little slower than the XS1 board running 4 threads but it's not that surprising since it ends up just waiting on the final thread but the XS1 gets 125MHz threads. The XS1 would probably be even faster running 8 threads since the earlier ones would finish and move the processing time along to later threads.

I'm not sure I understand how fine level threading will work on an xCore for this code. Does it get the correct number of primes? Are the arrays duplicated?

The way I set it up was running it through from start to end in a single thread. Any time it got to a point where a thread would start I'd basically print out the function state data. So each thread would start with those state variables(including arrays) and then break when the proper number of primes was done.

https://gist.github.com/Maginnovision/2f7bd99afeeed351d421573950fbfdee

The main was basically:
Code: [Select]
#ifdef EXPLORER5
    unsafe {
        timing <: 1;
        par {
            EX5countPrime0(&res1);
            EX5countPrime1(&res2);
            EX5countPrime2(&res3);
            EX5countPrime3(&res4);
            EX5countPrime4(&res5);
        }
        timing <: 0;

        res = res1 + res2 + res3 + res4 + res5;
        printf("%d primes(1) found\n", res1);
        printf("%d primes(2) found\n", res2);
        printf("%d primes(3) found\n", res3);
        printf("%d primes(4) found\n", res4);
        printf("%d primes(5) found\n", res5);
        printf("%d primes(total) found\n\n", res);
    }
#endif

The timing was done by a teensy 3.6 monitoring the "timing" gpio.
« Last Edit: November 20, 2019, 08:56:50 am by maginnovision »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #154 on: November 20, 2019, 09:09:14 am »
The way I set it up was running it through from start to end in a single thread. Any time it got to a point where a thread would start I'd basically print out the function state data. So each thread would start with those state variables(including arrays) and then break when the proper number of primes was done.

https://gist.github.com/Maginnovision/2f7bd99afeeed351d421573950fbfdee

Ahhh .. so it wasn't starting the algorithm from zero knowledge .. each thread started from a snapshot made during a prior "learning" run.

So the total amount of work is the same, but it's actually not running the algorithm multithreaded.

What happens if you start all the threads from empty arrays, start each thread at trial = 3 + threadNum, and increment trial by NumThreads each time at try_next? And then sum nPrimes from all the threads at the end?  Or can you do a mutex with a single global "trial" variable? I assume there's an "atomic increment memory" instruction?

 

Offline techman-001

  • Frequent Contributor
  • **
  • !
  • Posts: 748
  • Country: au
  • Electronics technician for the last 50 years
    • Mecrisp Stellaris Unofficial UserDoc
Re: Raspberry Pi 4
« Reply #155 on: November 20, 2019, 03:18:39 pm »
Got my Teensy 4.0 board and set it up.

I get the same results as GeorgeOfTheJungle, at 600 MHz, to the ms, adding my code to an Arduino sketch and adapting main() slightly and calling it from setup():
Code: [Select]
int main(){
  long beg = millis();
  int res = countPrimes();
  long m = millis() - beg;
  Serial.print(res);
  Serial.print(" primes found in ");
  Serial.print(m);
  Serial.println(" ms");
  return 0;
}

3713160 primes found in 37381 ms "faster" (the default)
3713160 primes found in 43516 ms "fast"

Verified that "fast" is -O1 and "faster" is -O2.  Compile line for "fast":

/home/bruce/software/arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-gcc -O1 -Wl,--gc-sections,--relax -T/home/bruce/software/arduino-1.8.10/hardware/teensy/avr/cores/teensy4/imxrt1062.ld -mthumb -mcpu=cortex-m7 -mfloat-abi=hard -mfpu=fpv5-d16 -o /tmp/arduino_build_829669/Blink.ino.elf /tmp/arduino_build_829669/sketch/Blink.ino.cpp.o /tmp/arduino_build_829669/core/core.a -L/tmp/arduino_build_829669 -larm_cortexM7lfsp_math -lm -lstdc++

The code is 228 bytes long which is in line with some other Thumb2 results I've had but bigger than some. The gcc is prettty old .. 5.4.1 20160919.

Something that confuses me is that the objdump supplied in arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-objdump doesn't understand some of the instructions in the elf file!

Apologies for butting in, but I love benchmarks for their fun value and just had to try your 3713160 primes with Forth.

I used  a generic Forth primes algorithm written by Mark Willis and archived from a post he made on Usenet in 2011.

It compiled to 184 bytes under Mecrisp-Stellaris running on a STM32F103 with 8kB Ram and 128kB Flash clocked at 72 Mhz.

3713160 primes took 219927 ms not printing the primes
3713160 primes took 283662 ms after printing, the last  prime being 3713159

... 3712627 3712669 3712679 3712697 3712699 3712711 3712717 3712721 3712739 3712747 3712757 3712769 3712801 3712823 3712831 3712843 3712871 3712873 3712889 3712897 3712909 3712927 3712949 3712979 3712981 3713027 3713041 3713053 3713057 3713069 3713071 3713077 3713081 3713147 3713153 3713159

 ;D
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: Raspberry Pi 4
« Reply #156 on: November 20, 2019, 06:32:09 pm »
3713160 primes took 219927 ms not printing the primes
3713160 primes took 283662 ms after printing, the last  prime being 3713159

... 3712627 3712669 3712679 3712697 3712699 3712711 3712717 3712721 3712739 3712747 3712757 3712769 3712801 3712823 3712831 3712843 3712871 3712873 3712889 3712897 3712909 3712927 3712949 3712979 3712981 3713027 3713041 3713053 3713057 3713069 3713071 3713077 3713081 3713147 3713153 3713159

Looks like you've found all primes less than 3713160, not the first 3713160 primes. The last one should be 62710561, which is almost 17 times bigger.
 
3713159 is the 264262th prime, so by that measure you've gone about 1/14th of the way.

A full run will take about 30 times longer. It gets harder as you go along.
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #157 on: November 20, 2019, 07:56:58 pm »
The way I set it up was running it through from start to end in a single thread. Any time it got to a point where a thread would start I'd basically print out the function state data. So each thread would start with those state variables(including arrays) and then break when the proper number of primes was done.

https://gist.github.com/Maginnovision/2f7bd99afeeed351d421573950fbfdee

Ahhh .. so it wasn't starting the algorithm from zero knowledge .. each thread started from a snapshot made during a prior "learning" run.

So the total amount of work is the same, but it's actually not running the algorithm multithreaded.

What happens if you start all the threads from empty arrays, start each thread at trial = 3 + threadNum, and increment trial by NumThreads each time at try_next? And then sum nPrimes from all the threads at the end?  Or can you do a mutex with a single global "trial" variable? I assume there's an "atomic increment memory" instruction?

So far these are my results:
Code: [Select]
1 Thread - 383701084 microseconds, 3713160 primes(total) found
That's it, because it's been running for 2 hours without finishing 2 threads, so I'm guessing that doesn't work. I didn't sleep last night because my kids are sick but if you think of anything else to try I'm glad to, but I won't be coming up with a true multithreaded version soon, haha. I also had some significantly different results from the 2560, so I'm running it again to make sure it doesn't give me a different result. Lastly, your primes.txt got changed at some point(probably testing your 2560?) and it now has SZ defined as 100 by default, not 1000.
« Last Edit: November 20, 2019, 07:59:04 pm by maginnovision »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #158 on: November 20, 2019, 08:05:05 pm »
Lastly, your primes.txt got changed at some point(probably testing your 2560?) and it now has SZ defined as 100 by default, not 1000.

Oops. Must have pushed a temporary version by accident. Fixed.
 

Offline techman-001

  • Frequent Contributor
  • **
  • !
  • Posts: 748
  • Country: au
  • Electronics technician for the last 50 years
    • Mecrisp Stellaris Unofficial UserDoc
Re: Raspberry Pi 4
« Reply #159 on: November 20, 2019, 08:31:44 pm »
3713160 primes took 219927 ms not printing the primes
3713160 primes took 283662 ms after printing, the last  prime being 3713159

... 3712627 3712669 3712679 3712697 3712699 3712711 3712717 3712721 3712739 3712747 3712757 3712769 3712801 3712823 3712831 3712843 3712871 3712873 3712889 3712897 3712909 3712927 3712949 3712979 3712981 3713027 3713041 3713053 3713057 3713069 3713071 3713077 3713081 3713147 3713153 3713159

Looks like you've found all primes less than 3713160, not the first 3713160 primes. The last one should be 62710561, which is almost 17 times bigger.
 
3713159 is the 264262th prime, so by that measure you've gone about 1/14th of the way.

A full run will take about 30 times longer. It gets harder as you go along.

Thanks for the update, <DUH> I assumed that the single parameter in Marks program was for the number of primes, but it looks like it was for the maximum prime itself. Trying now with 62710561 :)

« Last Edit: November 20, 2019, 08:46:18 pm by techman-001 »
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #160 on: November 20, 2019, 08:36:10 pm »
Code: [Select]
#define SZ 1000
int primes[SZ], sieve[SZ];
int nSieve = 0;

int countPrimes() {
primes[0] = 2; sieve[0] = 4; ++nSieve;
int nPrimes = 1, trial = 3+1, sqr = 2;
while (1) {
while (sqr * sqr <= trial) ++sqr;
--sqr;
for (int i = 0; i < nSieve; ++i) {
if (primes[i] > sqr) goto found_prime;
while (sieve[i] < trial) sieve[i] += primes[i];
if (sieve[i] == trial) goto try_next;
}
break;
found_prime:
if (nSieve < SZ) {
primes[nSieve] = trial;
sieve[nSieve] = trial * trial;
++nSieve;
// printf("Saved %d: %d\n", nSieve, trial);
}
++nPrimes;
try_next:
trial += 2;
}
return nPrimes;
}

This fails to finish on my PC in a few minutes which would be a thread if it was thread 1 in 0..1. Since it finished in 2.435 seconds normally it's not just the micro. I let it run for 7 minutes on the PC.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #161 on: November 20, 2019, 09:34:50 pm »
Ah, yeah, it's not going to find any primes at all if you only test even numbers :-)

The same change but with initial trial kept at 3 not 3+1 takes 3.35 sec instead of 3.40 sec on the machine I'm sitting at now.

So, yeah that scheme for threading isn't going to work. Need to stop one thread getting ahead of the others, at least until the arrays are filled.

It might make sense to just use a single thread until SZ primes (1000) have been found and the arrays filled. That's a very small proportion of the total time.
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #162 on: November 20, 2019, 11:08:14 pm »
Ah, yeah, it's not going to find any primes at all if you only test even numbers :-)

The same change but with initial trial kept at 3 not 3+1 takes 3.35 sec instead of 3.40 sec on the machine I'm sitting at now.

So, yeah that scheme for threading isn't going to work. Need to stop one thread getting ahead of the others, at least until the arrays are filled.

It might make sense to just use a single thread until SZ primes (1000) have been found and the arrays filled. That's a very small proportion of the total time.

I had tried another version which immediately aborted but I thought the even numbers version was more indicative of the problem. Also I've had 3 results in a row with the 2560 9,751,193 millis. So it seems by default it was using the ext ram. When we were using this board previously all memory was mapped so I never thought about what it would do if it wasn't told what to do. So when scaled for clock speed it's not far off your result(but actually a little quicker). It wouldn't be too hard to lock the other threads using a lock that thread 0 holds until arrays filled/done. I think only having arrays populated will cause a ton of duplicate work. You'd basically be having every thread beyond the first doing  the same work. If implementing the trial = 3 + threadnumber and trial + totalthreads it would be somewhat duplicated but not entirely.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #163 on: November 20, 2019, 11:25:34 pm »
Apologies for butting in, but I love benchmarks for their fun value and just had to try your 3713160 primes with Forth.

Apologies for butting in, but (idem) with JavaScript :) :
primes.js: 3713160 primes found in 14078 ms

Quote from: primes.js
(function countPrimes (SZ, primes, sieve, nPrimes, nSieve, trial, sqr, t0) {
loop:
    while (1) {
        trial+= 2;
        while (sqr*sqr <= trial) sqr++;
        sqr--;
        for (var j=0; j<nSieve; j++) {
            if (primes[j] > sqr) {
                if (nSieve < SZ) {
                    primes.push(trial);
                    sieve.push(trial*trial);
                    nSieve++;
                    //console.log(nSieve, trial);
                }
                nPrimes++;
                continue loop;
            }
            while (sieve[j] < trial) sieve[j]+= primes[j];
            if (sieve[j] === trial) continue loop;
        }
        break;
    }
    console.log(nPrimes+ " primes found in "+ (Date.now()- t0)+ " ms");
})(1000, [2], [4], 1, 1, 1, 2, Date.now());
« Last Edit: November 22, 2019, 06:12:32 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline techman-001

  • Frequent Contributor
  • **
  • !
  • Posts: 748
  • Country: au
  • Electronics technician for the last 50 years
    • Mecrisp Stellaris Unofficial UserDoc
Re: Raspberry Pi 4
« Reply #164 on: November 21, 2019, 12:45:05 am »
3713160 primes took 219927 ms not printing the primes
3713160 primes took 283662 ms after printing, the last  prime being 3713159

... 3712627 3712669 3712679 3712697 3712699 3712711 3712717 3712721 3712739 3712747 3712757 3712769 3712801 3712823 3712831 3712843 3712871 3712873 3712889 3712897 3712909 3712927 3712949 3712979 3712981 3713027 3713041 3713053 3713057 3713069 3713071 3713077 3713081 3713147 3713153 3713159

Looks like you've found all primes less than 3713160, not the first 3713160 primes. The last one should be 62710561, which is almost 17 times bigger.
 
3713159 is the 264262th prime, so by that measure you've gone about 1/14th of the way.

A full run will take about 30 times longer. It gets harder as you go along.

Thanks for the update, <DUH> I assumed that the single parameter in Marks program was for the number of primes, but it looks like it was for the maximum prime itself. Trying now with 62710561 :)

Reaching the prime of 62710561  took 2894115 ms on a STM32F103 MCU  (same as in the Blue Pill)  @ 72 MHz  using Forth. I'd expect it to be perhaps 3x quicker using Assembly or C ?

This was about 12.9 times longer than my previous effort to the prime of 3713160.
 

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #165 on: November 21, 2019, 09:34:27 am »
It is about running the same source on different archs and measuring elapsed time, imho, not about the result :)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #166 on: November 21, 2019, 08:40:27 pm »
It is about running the same source on different archs and measuring elapsed time, imho, not about the result :)

Yes, my idea was to run exactly the same C code and with as similar as possible quality of compiler (e.g. gcc -O1) on a wide range of machines. It's not too easy to make something that takes long enough to measure it on a fast machine but can fit on to a very small machine, AND that can't be optimized to nothing by stupid compiler tricks.

Translating the same algorithm to a similar language with compiler or JIT can be interesting too. I've actually have a Java version at http://hoult.org/primes.java for some time, but not publicized it.

I made things a little bit harder for this by using goto in the C code. They are quite structured so it translates easily to a language with labelled break & continue.

Using a very different language such as Forth is maybe interesting to compare different languages on the same machine. Using a completely different algorithm is .. well, that's another comparison again. Fun to play with sure.

My primes algorithm is probably not the best one in the world :-) Actually, I wrote essentially the same algorithm in FORTRAN IV in 1980 when I was a 17 year old high school student, entered it on punched cards, and ran it on the Burroughs B1700 in a computer bureau in Whangarei one night. In that program I printed all the primes less than 1,000,000. I remember that I used a 3-way "computed goto" :-) So I have a little bit of an attachment to this algorithm. In fact that FORTRAN program was a bit more sophisticated as it started with trial=5 and then incremented it alternately by 2 or 4 (5, 7, 11, 13, 17, 19, 23, 25 ...) so not even testing multiples of 2 or 3.

It was interesting when "sieve" became a commonly used benchmark on microcomputers a few years later it used a completely different implementation, with a bitmap of all the numbers in the range of interest (which would need 7.5 MB of memory to find the same primes as the program we are using here!) and as each prime is found clearing the bits for all multiples of that prime.

Both algorithms can I think be justifiably called "Sieve of Eratosthenes". The bitmap one is perhaps closer to what Eratosthenes actually did. Looking now, I see that Wikipedia lists my version as "Incremental sieve" https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Incremental_sieve. Of course Wikipedia wasn't available to me in hick-town NZ in 1980 so I was forced to come up with the algorithm (including the point Wikipedia notes of starting at the square of each new prime) by myself.

Ha!! Wikipedia's reference for "Incremental Sieve", a 2008 paper, calls my version "The Genuine Sieve of Eratosthenes" and the bitmap one "unfaithful sieve"! https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

But anyway, this is all just in fun, so any way people want to take the time to play around with something a bit different is fine by me, and interesting :-)
 
The following users thanked this post: 2N3055, iMo

Online westfw

  • Super Contributor
  • ***
  • Posts: 4199
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #167 on: November 22, 2019, 01:17:07 am »
Quote
I wrote essentially the same algorithm in FORTRAN IV in 1980
Heh.  I wrote a similar program at about the same time, in PDP10 assembler.It was a bit memorable because it was the first time I "noticed" that printing numbers was "expensive" compared to just calculating them...


Hmm...  Oh look - still around, I think!

Code: [Select]
start2: move CNT,count          ;number of primes that we want total
        movei THIS,3
        movei PLACE,1           ;last used spot in the table
NUM.LP: addi THIS,2             ;step to next prime
        setz INDEX,             ;start with first prime in table
TST.LP: move TEST,THIS
        idiv TEST,TABLE(INDEX)
        jumpe REM,NUM.LP        ;evenly divisible, go to next number to test
        move TEST,TABLE(INDEX) 
        imul TEST,TEST          ;square the current probe.
        camg TEST,THIS          ;result bigger than test number -> done
        aoja INDEX,TST.LP
;Current number is Prime!
        addi PLACE,1            ;next spot in the table
        movem THIS,TABLE(PLACE) ;save the prime
        movei 1,101
        move 3,[5,,^d10]
        move 2,this
        nout
         trn
        sojg CNT,NUM.LP         ;try the next odd number
        haltf

        lit
        var

TABLE:  2                       ;first two primes, as a base
        3
;area to be filled in by program...
end start
 

Offline senso

  • Frequent Contributor
  • **
  • Posts: 951
  • Country: pt
    • My AVR tutorials
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #168 on: November 22, 2019, 03:24:33 pm »
Gave this a try, for curiosity sake..
Ubuntu inside a VirtualBox VM, running with 2 virtual cores, real CPU is a i7-8750H tweaked a bit with ThrottleStop.

gcc -O1:

Code: [Select]
Starting run
3713160 primes found in 2797 ms
242 bytes of code in countPrimes()

real 0m2,800s
user 0m2,794s
sys 0m0,004s

With gcc -O2 something funny happens, is slower and reports negative size..

Code: [Select]
Starting run
3713160 primes found in 3028 ms
-416 bytes of code in countPrimes()

real 0m3,038s
user 0m3,021s
sys 0m0,008s

gcc -Os:
Code: [Select]
Starting run
3713160 primes found in 3181 ms
-410 bytes of code in countPrimes()

real 0m3,186s
user 0m3,178s
sys 0m0,004s

And gcc -O0:

Code: [Select]
Starting run
3713160 primes found in 8623 ms
392 bytes of code in countPrimes()

real 0m8,643s
user 0m8,612s
sys 0m0,012s

And for fun, latest version of VS Community Edition 2017, C++ compiler, O1 optimization, on native Win10:
Code: [Select]
Starting run
3713160 primes found in 3460 ms
147 bytes of code in countPrimes()

So, Linux on a VM is 600ms faster than the native MS compiler :wtf:
« Last Edit: November 22, 2019, 03:39:30 pm by senso »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #169 on: November 22, 2019, 05:05:26 pm »
Finding the code size from in the program itself is hacky and depends on findPrimes() and main() staying adjacent in the final binary and in that order. It seems to be reasonably reliable on gcc with -O1, but not higher levels.

147 bytes of code is impressively small from vc++. Much smaller than any other ISA so I'm not sure that's a genuine number. Would have to look at the disassembly to know.

I also found -O1 faster than -O2 on my i7-8650U machines.
 

Offline gmb42

  • Frequent Contributor
  • **
  • Posts: 294
  • Country: gb
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #170 on: November 22, 2019, 05:18:28 pm »
And for fun, latest version of VS Community Edition 2017, C++ compiler, O1 optimization, on native Win10:
Code: [Select]
Starting run
3713160 primes found in 3460 ms
147 bytes of code in countPrimes()

So, Linux on a VM is 600ms faster than the native MS compiler :wtf:

The VS optimisations flags are a little different to gcc, for VS 2017 they are:

Code: [Select]
/O1 maximum optimizations (favor space) /O2 maximum optimizations (favor speed)
/Ob<n> inline expansion (default n=0)   /Od disable optimizations (default)
/Og enable global optimization          /Oi[-] enable intrinsic functions
/Os favor code space                    /Ot favor code speed
/Ox optimizations (favor speed)         /Oy[-] enable frame pointer omission
/favor:<blend|ATOM> select processor to optimize for, one of:
    blend - a combination of optimizations for several different x86 processors
    ATOM - Intel(R) Atom(TM) processors

So O1 actually minimises space as can be seen in the output that says only
Code: [Select]
147 bytes of code in countPrimes()
In my machine (i7-8700) with /O1:

Code: [Select]
Starting run
3713160 primes found in 3145 ms
146 bytes of code in countPrimes()

and with /O2:

Code: [Select]
Starting run
3713160 primes found in 3115 ms
176 bytes of code in countPrimes()
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #171 on: November 22, 2019, 05:47:18 pm »
On my Mac -Os is smaller (172 / 224) and faster (4.9 / 5.6) than -O1

Quote
$ gcc -Os /primes.c ; time ./a.out
Starting run
3713160 primes found in 4977 ms
172 bytes of code in countPrimes()

real   0m4.981s
user   0m4.973s
sys   0m0.006s

$ gcc -O1 /primes.c ; time ./a.out
Starting run
3713160 primes found in 5603 ms
224 bytes of code in countPrimes()

real   0m5.612s
user   0m5.591s
sys   0m0.016s

And -O0 is slower than JavaScript...

primes.js: 3713160 primes found in 14078 ms

Quote
$ gcc -O0 /primes.c ; time ./a.out
Starting run
3713160 primes found in 18625 ms
400 bytes of code in countPrimes()

real   0m18.640s
user   0m18.597s
sys   0m0.032s
« Last Edit: November 22, 2019, 05:49:46 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #172 on: November 22, 2019, 05:56:53 pm »
Finding the code size from in the program itself is hacky and depends on findPrimes() and main() staying adjacent in the final binary and in that order. It seems to be reasonably reliable on gcc with -O1, but not higher levels.

147 bytes of code is impressively small from vc++. Much smaller than any other ISA so I'm not sure that's a genuine number. Would have to look at the disassembly to know.

I also found -O1 faster than -O2 on my i7-8650U machines.

Use nm:

Quote
$ nm ./a.out
0000000100000000 T __mh_execute_header
                 U _clock
0000000100000d8a T _countPrimes
0000000100000e36 T _main
0000000100001030 S _nSieve
0000000100001040 S _primes
                 U _printf
                 U _puts
0000000100001fe0 S _sieve
                 U dyld_stub_binder

0xe36-0xd8a= 172
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #173 on: November 22, 2019, 07:24:26 pm »
Nobody get better than my 2.435s yet? It's not even a fast PC. Maybe it's all the mobile CPUs.
 

Online iMo

  • Super Contributor
  • ***
  • Posts: 4785
  • Country: pm
  • It's important to try new things..
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #174 on: November 22, 2019, 08:45:53 pm »
Pelles C, Win7 64b, i3-6320 3900 MHz
-std:C11 -Tx64-coff -Ot -Ob1 -fp:precise -W1 -Gr

3713160 primes found in 3603 ms
224 bytes of code in countPrimes()


« Last Edit: November 22, 2019, 08:49:14 pm by imo »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #175 on: November 22, 2019, 09:47:02 pm »
Finding the code size from in the program itself is hacky and depends on findPrimes() and main() staying adjacent in the final binary and in that order. It seems to be reasonably reliable on gcc with -O1, but not higher levels.

Use nm:

*I* know how to find the real code size -- I've been doing that on many platforms for many years, as you can see in the results.

I added the hacky calculation in the program itself because last week every single person who submitted a result they had run on some machine I don't have didn't tell me the code size.
 

Offline maginnovision

  • Super Contributor
  • ***
  • Posts: 1963
  • Country: us
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #176 on: November 22, 2019, 10:15:21 pm »
If you copy paste the code, build with gcc -O1, and run it you should get the right answer. I didn't check my avr binary, but I have checked my xs1 and xs2 binaries I just didn't add them to the posts.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #177 on: November 25, 2019, 11:55:30 am »
Quote
Can you look at the disassembly and see if the accesses are single instruction?

I'm using the arduino IDE, I don't know how to do that with this.

The Arduino IDE actually makes it relatively simple to do this. Make a trivial change to your source code -- maybe add and delete a character and then hit the "check"/"compile" button. IN the panel at the bottom (maybe make it bigger) you'll see a line like the following, with the first (very long) "word" ending in gcc and the path to your eventual executable binary file in the middle (here Blink.ino.elf) after a "-o". Or for certain targets the gcc might be ld instead.

/home/bruce/software/arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-gcc -O1 -Wl,--gc-sections,--relax -T/home/bruce/software/arduino-1.8.10/hardware/teensy/avr/cores/teensy4/imxrt1062.ld -mthumb -mcpu=cortex-m7 -mfloat-abi=hard -mfpu=fpv5-d16 -o /tmp/arduino_build_829669/Blink.ino.elf /tmp/arduino_build_829669/sketch/Blink.ino.cpp.o /tmp/arduino_build_829669/core/core.a -L/tmp/arduino_build_829669 -larm_cortexM7lfsp_math -lm -lstdc++

Open a terminal window (from your OS, nothing to do with gcc) and copy and paste the bit with gcc or ld and the output file. Don't try to run it yet!

/home/bruce/software/arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-gcc  /tmp/arduino_build_829669/Blink.ino.elf

Now just replace the "gcc" bit by "objdump -d":

/home/bruce/software/arduino-1.8.10/hardware/teensy/../tools/arm/bin/arm-none-eabi-objdump -d  /tmp/arduino_build_829669/Blink.ino.elf

You can run that.

If you can't scroll your terminal window backwards then you might want to put " | more" (or " | less") on the end, or redirect the output to a file with " >/home/bruce/myDisassembly.txt" or whatever other location or name you want. (Your name probably isn't Bruce...)

If the compiler is gcc then you can get an assembly language listing by instead finding the line that compiled your code ("Blink.ino.cpp") to an object file ("-o .../Blink.ino.cpp.o"). You can just copy and paste the whole line into your console/terminal window and re-run it. If you add to the end " -g -Wa,-adhl" then you'll get a listing printed to the terminal with the original lines of C code, the generated assembly language, and the binary (hex) code for the instructions.

Thank you Sir!

This:
Code: [Select]
void loop () {
  register uint32_t mask= 0xa;
  register volatile uint32_t* toggle= (volatile uint32_t*) 0x4200408c;
  while (1) {
    *toggle= mask;
    *toggle= mask;
    *toggle= mask;
    *toggle= mask;
  }
}

Gives:
Quote
00000194 <loop>:
     194:   230a4a03    .word   0x230a4a03
     198:   6013         str   r3, [r2, #0]
     19a:   6013         str   r3, [r2, #0]
     19c:   6013         str   r3, [r2, #0]
     19e:   6013         str   r3, [r2, #0]
     1a0:   e7fa         b.n   198 <loop+0x4>
     1a2:   bf00         nop
     1a4:   4200408c    .word   0x4200408c

=> Not much room for improvement I guess...  :(
« Last Edit: November 25, 2019, 12:01:27 pm by GeorgeOfTheJungle »
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #178 on: November 25, 2019, 01:47:15 pm »
Quote
00000194 <loop>:
     194:   230a4a03    .word   0x230a4a03
     198:   6013         str   r3, [r2, #0]
     19a:   6013         str   r3, [r2, #0]
     19c:   6013         str   r3, [r2, #0]
     19e:   6013         str   r3, [r2, #0]
     1a0:   e7fa         b.n   198 <loop+0x4>
     1a2:   bf00         nop
     1a4:   4200408c    .word   0x4200408c

And here we see the quite extraordinary phenomenon of a toolchain's "objdump" not understanding code generated by the compiler from the same toolchain! I've seen this with my Teensy 4.0.

The 230a4a03 should I believe (based on the rest of the code) disassemble to:

   194:      4a03      ldr      r2, [pc, #12]
   196:      230a      movs r3, #10

Those are not arcane or new instructions! They are absolutely standard original Thumb instructions present right from the ARM7TDMI in 1994.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #179 on: November 25, 2019, 02:18:04 pm »
Quote
00000194 <loop>:
     194:   230a4a03    .word   0x230a4a03
     198:   6013         str   r3, [r2, #0]
     19a:   6013         str   r3, [r2, #0]
     19c:   6013         str   r3, [r2, #0]
     19e:   6013         str   r3, [r2, #0]
     1a0:   e7fa         b.n   198 <loop+0x4>
     1a2:   bf00         nop
     1a4:   4200408c    .word   0x4200408c

And here we see the quite extraordinary phenomenon of a toolchain's "objdump" not understanding code generated by the compiler from the same toolchain! I've seen this with my Teensy 4.0.

The 230a4a03 should I believe (based on the rest of the code) disassemble to:

   194:      4a03      ldr      r2, [pc, #12]
   196:      230a      movs r3, #10

Those are not arcane or new instructions! They are absolutely standard original Thumb instructions present right from the ARM7TDMI in 1994.

And I retouched it a bit, because objdump gave me this:

Quote
00000194 <loop>:
     194:   230a4a03    .word   0x230a4a03
     198:   6013         str   r3, [r2, #0]
     19a:   6013         .short   0x6013
     19c:   6013         str   r3, [r2, #0]
     19e:   6013         .short   0x6013
     1a0:   e7fa         b.n   198 <loop+0x4>
     1a2:   bf00         nop
     1a4:   4200408c    .word   0x4200408c
The further a society drifts from truth, the more it will hate those who speak it.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #180 on: November 25, 2019, 02:39:03 pm »
That's just insane.
 

Online Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6261
  • Country: fi
    • My home page and email address
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #181 on: November 25, 2019, 03:38:22 pm »
Technically, gcc/cc1/g++ and gas/gdb/objdump are different packages in the same toolchain, which explains why sometimes gdb/objdump disagree with gcc what the binary code actually is.

Nevertheless, that is an obvious bug in binutils-gdb, and should be reported to the bugzilla with reproducible examples.

That said, bugs #10288 and #10924 have been open since 2009, and are about ARM7TDMI instruction decoding.  It looks like nobody cared enough to do it properly.  Most likely, companies used just enough resources to get support into GCC, and let the users worry about the rest of the toolchain.
 

Offline GeorgeOfTheJungleTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 2699
  • Country: tr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #182 on: November 25, 2019, 06:51:58 pm »
I had never seen a µC do a jmp in zero cycles before.  :-+

Code: [Select]
while (1) *toggle= mask;
How fast can those longan RISC-Vs toggle a gpio? 1/4th the µC clock? 10MHz was the max I could get on a esp32@240MHz.
The further a society drifts from truth, the more it will hate those who speak it.
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14475
  • Country: fr
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #183 on: November 25, 2019, 07:11:08 pm »
I had never seen a µC do a jmp in zero cycles before.  :-+

Code: [Select]
while (1) *toggle= mask;

Whereas I'm not sure there were any "MCU" per se that had this, there were certainly CPUs in general that could loop with zero overhead, using some kind of "REP" prefix instruction. Not exactly a general "jmp" of course, but could still be used for many things.
(Don't some of the Microchip PICs have something like this in their instruction set? Maybe the dsPIC?)
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 8646
  • Country: gb
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #184 on: November 25, 2019, 07:13:35 pm »
I had never seen a µC do a jmp in zero cycles before.  :-+
There have been some DSP oriented controller cores which offered zero cycle loop overhead, just like most full on DSP cores.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #185 on: November 26, 2019, 12:28:39 am »
I had never seen a µC do a jmp in zero cycles before.  :-+
There have been some DSP oriented controller cores which offered zero cycle loop overhead, just like most full on DSP cores.

The new ARMv8.1-M spec includes zero-overhead loops:



We think there's a better way -- stay tuned :-)
 

Offline Berni

  • Super Contributor
  • ***
  • Posts: 4955
  • Country: si
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #186 on: November 26, 2019, 06:17:07 am »
Oh that's neat. I had no idea that ARM could do that.

Do things like this get used by the C compiler when you write a for loop with a known length?
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4036
  • Country: nz
Re: RPi 4 / STM32 / ESP32 / Teensy 4 / RISC-V GAZPACHO
« Reply #187 on: November 26, 2019, 08:21:48 am »
Oh that's neat. I had no idea that ARM could do that.

I'd expect it's going to be a year or two before they'll be shipping any cores that can do this.

Quote
Do things like this get used by the C compiler when you write a for loop with a known length?

That will be extremely easy to add to gcc and llvm, yes.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf