Author Topic: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores (Read 7798 times)

starhawk · « **on:** May 12, 2020, 05:49:38 am »

I hope I'm in the right place for this... heck if I know :-/ it's late and I'm tired... mods, please move me if I need it, I'm not gonna argue!

*ahem*

I'm generally a hardware guy, as far as computers are concerned, and I'm decent-ish with electronics (my circuit design skills are *considerably* better than my iron-meets-perfboard skills!) -- but one place I fall down hard is processing power, particularly with older stuff (286/etc) and microcontroller gear (Arduino, etc).

So I was reading the Hackaday review of the Teensy 4.1 that just came out, and it occurred to me that the 600MHz cited CPU-core clocking wasn't that far from that of the original Raspberry Pi at 700MHz. I gather that, since the Gen1 RasPi's SoC was ARMv11 somethingorother (ARM confuses me *real* fast!), it's remarkably different inside from the Cortex-M7 chip of the Teensy, but the 900MHz Cortex-A7 chip on the RasPi Gen2 boards sounds like it might be a goodly bit closer. I do gather that the "M" vs "A" substitution means something important, but I've no idea what.

That said, as I mentioned before, ARM stuff gets me right boggled basically instantly. It feels like there's approximately a hundred zapterillion variations of that supposedly single architecture out there!

Further confusing me, I know a lot of older cheap no-name Android tablets (the ones that got sold for us$30, or the equivalent thereof, in various drugstores around the country and probably the globe) had an ARM SoC that the Taiwanese CPU, chipset, and peripheral-chip manufacturer VIA Technologies spat out by the zillions, that was *also* meant to run at 600MHz, but was (like everything else VIA cranks out) mad stupid inside and so typically wound up being drastically overclocked, much to the detriment of the batteries in those tablets -- but I know absolutely nothing about the internals of that SoC, or even whether it was a single specific chip or a family thereof, other than it/they were all absolutely horrid to use, even when overclocked to within an inch of outright catching fire.

Can someone explain to me the differences at work between each of these individual chips (treating the VIA family, if it is one, as a singular chip to the greatest extent possible)...? An idea of an approximately-comparable x86 equivalent to the Teensy 4.1, if that's at all imaginable, would be deeply appreciated as well.

ataradov · « **Reply #1 on:** May 12, 2020, 07:10:55 am »

Thinks of it this way in a first approximation:
A - can run Linux.
M - can't run Linux.

There are many other differences, but if you application calls for a full OS (rich GUI, full applications), then A is the only option.

If you want a very powerful microcontroller, then M is your best bet.

That Teensy board is probably comparable to Pentium-166 in real world performance, if you want to compare to X86.

brucehoult · « **Reply #2 on:** May 12, 2020, 10:28:31 am »

What ataradov said except that the Teensy 4.0 is *insanely* fast for a microcontroller.

First, it has a dual-issue core, compared to the single-issue in the original Raspberry Pi and the Pi 2. Second, it only has 1 MB of RAM, but it is tried very tightly to the CPU core and is like a huge L1 cache on bigger chips. Also it doesn't have any MMU, which also speeds up memory access. At 600 MHz it is easily a match for the 900 MHz Pi 2 -- on my counting primes benchmark the Teensy 4.0 takes 43.5 seconds while the Pi 2 takes 47.9 seconds.

Also, you can overclock the Teensy 4.0 to 960 MHz in which case it takes 27.2 seconds. The 1.2 GHz also dual-issue Pi 3 takes 30.42 seconds. The Teensy 4.0 gets a little toasty doing that, so you might want to add a heatsink if you overclock it (it's a menu option right there in the Arduino IDE).

I have to disagree with ataradov about the comparison to Pentium-166. That's also a dual issue in-order machine, and with small caches.

I don't have one handy these days, but I expect the Teensy 4 at 600 MHz would give a Pentium II 450 MHz a good run for its money.

That's *IF* what you're doing fits into 1 MB RAM.

Note that the first Raspberry Pi had 256 MB of RAM and recent ones have up to 4 GB. A 2 GB Raspberry Pi 4 is exactly twice the price of a Teensy 4.0, has 4 cores each 4x faster than the Teensy 4, has 2048x more RAM, has Ethernet and WIFI and Bluetooth and 4 USB ports and a GPU and HDMI (2).

Nominal Animal · « **Reply #3 on:** May 12, 2020, 01:02:18 pm »

(Brucehoult, Teensy 4.1 is in beta, and physical units in testers hands. Not only does it have a lot more pins, but it also supports an optional 8 megabyte PSRAM add-on chip. The memory model has its quirks, and i.MX RT1062 is not the easiest microcontroller I've used, but it is a very interesting, powerful microcontroller platform.)

I do fully agree that Cortex M series is not suited for running Linux (although you "can"). The main benefit of Cortex A series is the Memory Management Unit. Essentially, the MMU controls access to memory, and can usually do address translation or paged memory or something similar. The access control is the most important bit, because it allows the kernel to support userspace processes who can only access a part of the available memory; critical for a fully-fledged operating system: you don't want a random userspace process to be able to brick the device.

uClinux (1999-2018) was a port of the Linux kernel (2.0 to 2.6) to MMU-less processors, including Cortex-M3, -M4, and -M7. Without a MMU each process could access all memory on the device, so there really wasn't any difference between "kernelspace" and "userspace", but development for these was basically the same as for fully-fledged Linux systems. So, you can run (an old port of) Linux on Cortex M series microcontrollers; it's just not that useful now that we have cheap processors with MMUs to use instead.

NiHaoMike · « **Reply #4 on:** May 12, 2020, 01:30:11 pm »

If the main advantage of bare metal or RTOS is low, predictable latency, why aren't most multi core application processors designed so that one of the cores can be reserved for running real time code? (Or do most such processors actually support it but software support is not common?)

brucehoult · « **Reply #5 on:** May 12, 2020, 04:25:07 pm »

Quote from: NiHaoMike on May 12, 2020, 01:30:11 pm

If the main advantage of bare metal or RTOS is low, predictable latency, why aren't most multi core application processors designed so that one of the cores can be reserved for running real time code?

That's exactly what SiFive's FU-540 SoC does.

https://static.dev.sifive.com/FU540-C000-v1.0.pdf

There are four U54 application processor cores implementing RV64IMAFDC, plus one E51 core implementing RV64IMAC.

The E51 has a 16 KB 2-way icache but it has 8 KB SRAM instead of a dcache (you can put code in there too, and execute it without polluting the icache). Up to half (8 KB) of the icache can be converted into another direct-access SRAM area ("ITIM"). The E51 core has a branch predictor with 256 BHT entries (for conditional branches), 30 BTB entries (for indirect branches), and 6 RAS entries (for subroutine returns). On newer versions of the FU-540 (such as the one embedded in the upcoming Microsemi FPGAs) the branch predictor can be disabled for 100% predictable execution times. This feature is not implemented in the version in the HiFive Unleashed. the E51 has PMP (Physical Memory Protection) to control access to memory ranges, but it doesn't have an MMU.

The four U54 cores each have 32 KB 8-way i- and d-caches, FPU, MMU and all that good stuff for Linux.

starhawk · « **Reply #6 on:** May 12, 2020, 06:36:33 pm »

OK, forgive me, but I'm *really* confused between something specifically mentioned in the datasheet for the i.MX RT106x series SoCs, and something asserted here. I'm not arguing, and I want to be clear about that -- I don't know enough to argue -- but I'm having some cognitive dissonance because these two things seem mutually exclusive to me.

(1) from the datasheet (https://www.nxp.com/docs/en/nxp/data-sheets/IMXRT1060CEC.pdf)...
Page 2:

Quote

The SoC-level memory system consists of the following additional components:
[...]
* External Memory Interfaces:
- 8/16-bit SDRAM, up to SDRAM-133/SDRAM-166
- 8/16-bit SLC NAND FLASH, with ECC handled in software
- SD/eMMC
- SPI NOR/NAND FLASH
- Parallel NOR FLASH with XIP support
- Two single/dual channel Quad SPI FLASH with XIP support

The first sounds to me like PC-133 SDRAM like you'd find in an old Win98 box. The second sounds a bit like a controllerless SSD. The third is pretty self-explanatory, as is the fourth. The fifth sounds a bit like an old SmartMedia card (I have a few for my Rio500 MP3 player... I love older electronics like that

), and the sixth sounds like one of the two chips for which there's a footprint on the underside of the Teensy, the other being a PSRAM that probably acts similarly enough to fool the SoC into thinking it's a second chip of the same sort (SPI Flash NVRAM) even though it's technically not.

But that distinctly sounds to me, since it explicitly states that these are *external* *memory* *interfaces*, like there's an MMU in there somewhere.

Further muddling things up is the absolutely useless "block diagram" on Page 9 of the datasheet, which looks like someone said "we *have* to have a block diagram" and the replies were "heck no we don't" followed by "you'll be sorry" followed by "we warned you"

it's three columns (and one row, at the very bottom) of boxes with category headers and none of it explains what's linked to what how. It's not even pretty to look at! But the eMMC / SD interface is listed under "Connectivity" and there's an "External Memory" category which lists "Dual-channel Quad SPI" and "Octal/Hyper Flash/RAM x2" in one box, and mentions an "External Memory Controller" with "8 bit / 16 bit SDRAM" "Parallel NOR Flash" "NAND FLASH" and "PSRAM" interfaces in another. I'm amused to notice that the PSRAM interface is mentioned here but not on Page 2. Interesting...

On Pages 10-16, there's a table that lists and explains various "modules" within the SoC... the last entry on Page 14 (but by no means last in the whole list!) is for the "SEMC" module -- the "Smart External Memory Controller". While this once again omits mention of PSRAM, there is another interesting addition, at the end of its description: "[...] as well as 8080 display interface."

Hmm... the 8080 was a CPU, of course, the one in the famous Altair 8800 and IMSAI 8080 (hellooooo "Wargames"!) computers, as well as many later S-100 bus machines, at least until the Z80 (which was based on it) came out. But that was the 8-bit era and *those* CPUs are *very* different creatures indeed. (Ironically, they're ones I understand pretty well... especially the 6502...). While there were a number of "display interface" types of chips, most of them were timing controllers -- what back then was called a "CRT Controller" ("CRTC" for short). Such chips required dedicated Video RAM but only sort of accessed it -- they would generate timing signals for a TV set or dedicated display-monitor, and spit out the relevant address signals to dump the contents of Video RAM into whatever support circuitry was in charge of feeding that to the screen and its analog boards proper. Probably the most popular such chip would've been the MC6845 from Motorola... in fact, as far as I'm aware, other than the *extremely* odd 8279 "Keyboard and Display Interface" (which is really for the sort of computer we'd now class as a hexadecimal trainer (think MOS Tech / Commodore KIM-1, or the Sinclair MK 14) -- it's designed to drive a hex keypad and a set of seven-segment LED digits, although the datasheet also "sort of" explains how to re-arrange the signals for the latter into use with a set of individual LEDs, should one desire that instead...), Intel themselves never released a display interface chip... although I have vague memories of coming across something somewhere, TBH, that mentioned both an MC6845 clone and another chip that was described as a "Small System CRT Controller" and sounded very much like a full-on VDG... except that it was basically never used! Apparently nobody bought it...

...but that's well off into the weeds, isn't it?

*ahem*

Can someone explain to me how this "external memory interface" is not a proper MMU?

ataradov · « **Reply #7 on:** May 12, 2020, 06:41:19 pm »

MMU has nothing to do with external or internal memories.

MMU is a block that translates virtual addresses into physical addresses. https://en.wikipedia.org/wiki/Memory_management_unit

You need that in order to run multiple applications at the same time linked at the same addresses. Both applications will think that they can access fixed memory, where in reality they access different blocks of physical memory.

MMU is essentially the only difference between MPUs and MCUs for modern high-end devices. If you look at the design of that NXP device, you will see that all they did is take out the MPU core (Cortex-A) and put the MCU core (Cortex-M) in its place. The rest of the system remained the same.

There is a port of Linux to Cortex-M devices called ucLinux. But it is plain painful to use, since you have to link applications at fixed addresses in advance using a very convoluted system. It is pointless and is not practical in any way.

Nominal Animal · « **Reply #8 on:** May 12, 2020, 07:39:03 pm »

Quote from: starhawk on May 12, 2020, 06:36:33 pm

Can someone explain to me how this "external memory interface" is not a proper MMU?

It is just an interface controller. It is a way for the processor to access external memory or storage using those interfaces. It doesn't do what a MMU does, mapping and controlling – managing – access to memory.

MMU is like a layer between the processor and RAM. External memory interfaces are just a way for the processor to read from and write to external RAM or storage devices.

One is a "connector", the other is an "active filter".

brucehoult · « **Reply #9 on:** May 13, 2020, 01:12:23 am »

Quote from: ataradov on May 12, 2020, 06:41:19 pm

There is a port of Linux to Cortex-M devices called ucLinux. But it is plain painful to use, since you have to link applications at fixed addresses in advance using a very convoluted system.

боже! They couldn't manage load-time relocation?

Ok, sure, if you're doing something embedded that is always running exactly the same set of apps then you could get away with that. But not anything interactive.

ataradov · « **Reply #10 on:** May 13, 2020, 01:32:01 am »

Quote from: brucehoult on May 13, 2020, 01:12:23 am

They couldn't manage load-time relocation?

It probably would have added overhead they did not want to deal with. Also, assembling rootfs is not the prettiest thing either. The whole ucLinux ecosystem is/was based on some strange file system, if I remember correctly.

But not sure if there is any real interest in the project.I think it is pretty much dead.

At the same time it is cool to see Linux kernel boot on Cortex-M7. A few yeas back I managed to boot their kernel on SAM V71, and it booted to the point of "Kernel Panic - not syncing: VFS: Unable to mount root fs". So the kernel is fully operational at this point. This means that it can be actually used as an embedded OS. Just don't do any user land, start kernel threads to do your work. You will still get full Linux USB and TCP/IP stacks.

Nice to see that forum finally supports Russian. It was previously just ignoring it. Typing in values from Russian datasheets was a pain.

brucehoult · « **Reply #11 on:** May 13, 2020, 11:44:15 am »

Quote from: blueskull on May 13, 2020, 08:02:03 am

Another example is K210, which was originally designed as a Linux-capable MPU, and has MMU and 8MB of super fast SRAM, but they eventually gave up implementing Linux due to lack of human resource and the lack of practicality of running any meaningful apps with only 8MB of RAM.

Who gave up? When?

The problem with the K210 turns out to be that they didn't implement the ratified MMU spec in PrivArch 1.10, but an older and quite different draft version 1.9. It took a while to discover this because ... Chinese documentation :-)

It had been expected that it was only necessary to support the ratified spec in the kernel because no 1.9 machine would ever escape into wide use. As this has turned out not to be the case, and the K210 is currently by far the most widely used RV64GC SoC, the work of supporting the 1.9 MMU has been done and it works fine and has in the last month or so started to be upstreamed.

brucehoult · « **Reply #12 on:** May 13, 2020, 11:47:08 am »

Quote from: blueskull on May 13, 2020, 08:02:03 am

lack of practicality of running any meaningful apps with only 8MB of RAM.

Obviously that's not terribly useful as a modern workstation (though my first x86 Linux machine didn't have much more), but an embedded application can do a lot with 8 MB.

tggzzz · « **Reply #13 on:** May 13, 2020, 02:19:58 pm »

Quote from: NiHaoMike on May 12, 2020, 01:30:11 pm

If the main advantage of bare metal or RTOS is low, predictable latency, why aren't most multi core application processors designed so that one of the cores can be reserved for running real time code? (Or do most such processors actually support it but software support is not common?)

The XMOS xCORE+xC series is built just that way.

The hardware is relatively easy (but many companies still manage to produce sub-optimum processors). The software for multicore programming is poolrly developed. Integrating the hardware with the software well has only been achieved by XMOS.

You can get a single-chip 32 core 4000MIPS chip from DigiKey, and the IDE will tell you exactly how long the code will take to execute. Coupled with the FPGA-like i/o structures, that enables hard real-time guarantees. I've used one to grab two 62.5Mb/s input streams and count the edges in software, plus front-panel i/o, plus communicating over USB with a PC - and it is guaranteed by design that none of the input transitions will be missed. None of the usual "measure and hope we spot the worst case" that you have to do with most systems.

The i/o can be at up to 250MS/s, and multiple chips can be transparently connected if you need more processing power.

As I said, the key point is the solid integration of hardware properties with software capabilities, plus a toolset that makes use of all that.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores (Read 7798 times)

starhawk

"Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

ataradov

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

brucehoult

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

Nominal Animal

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

NiHaoMike

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

brucehoult

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

starhawk

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

ataradov

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

Nominal Animal

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

brucehoult

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

ataradov

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

brucehoult

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

brucehoult

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

tggzzz

Re: "Explain Like I'm Five" Gen2 RasPi v. Teensy 4.1 CPU Cores

Share me