Author Topic: Language/ compiler and the most apropriate CPU to run the application. (Read 8405 times)

IconicPCB · « **on:** June 19, 2019, 02:52:20 am »

From a software onlooker...

Is it reasonable to expect that the development platform/language/ libraries used in writing an application will result in a preference for a CPU in terms of number of cores/ threads etc?

I hope I am showing my ignorance at a sufficiently low level so that an appropriate education will follow on.

Ampera · « **Reply #1 on:** June 19, 2019, 03:46:38 am »

Different languages and compilers (or even interpreters) will definitely have a difference of impact on the performance they will provide. C is a good example of where the compiler can be the difference between unusable, and heavily optimized. Languages and compilers will also have different support for threading, making multi-threaded systems better or less able to take advantage of their additional threads.

This isn't usually something you have to worry about unless you are working incredibly close to the metal if you're just starting programming. Modern compilers are incredibly efficient, and threading libraries are easy and common.

tggzzz · « **Reply #2 on:** June 19, 2019, 08:07:32 am »

Quote from: IconicPCB on June 19, 2019, 02:52:20 am

Is it reasonable to expect that the development platform/language/ libraries used in writing an application will result in a preference for a CPU in terms of number of cores/ threads etc?

Yes.

I like a language and processor that allows hard[1] realtime applications:

guaranteed worst case timings, without executing the code and hoping you caught a bad case
input-to-processing latencies of around 10ns
FPGA-like i/o structures, e.g. SERDES, strobed, clocked etc
guaranteeing when the next output will occur
recording when the last input occurred
easily expandable processing power by adding more chips and without changing code
hardware support for waiting for multiple events, e.g. i/o completion, or timeouts, or messages from another core
in other words, put a simple RTOS in silicon!
language support for all the above

There's only one candidate that satisfies those: xC on the xCORE processors. Buy them at Digikey and elsewhere.

[1] as I'm sure you are aware, hard realtime is predictable, not necessarily fast

Nominal Animal · « **Reply #3 on:** June 19, 2019, 01:39:36 pm »

Is it reasonable to expect.. ? I dunno. Depends.

I write a lot of code using C99 and POSIX.1-2008. If one uses the appropriate types (like size_t instead of int for nonnegative lengths of in-memory objects), this is surprisingly portable even between 32- and 64-bit architectures, across a number of processor types (Intel/AMD and ARM in particular). However, if I use vectorization (more properly, single-instruction-multiple-data extensions), that code usually has to be rewritten for each processor family; definitely when switching processor types.

For atomics, I still use compiler-provided built-ins (Intel CC, gcc, etc. provide), as I still see <stdatomic.h> as being a bit too "new" to rely on, right or wrong. Personal quirk.

My favourite language for writing user interfaces is currently Python 3. It is surprisingly portable, and I like the idea of having the user interface side being easily modified, even by end users. However, the current Python interpreters only run Python code in one thread at a time. That is, you can have multiple real threads, but only one of them can execute Python code at the same time. This means that you cannot use threads to parallelize Python computation; but you can e.g. have multiple threads blocking in I/O operations, waiting for incoming data to arrive or outgoing data to be sent. This in turn leads to a specific style or approach to code that works best (for example, how to create user interfaces that stay responsive even if the application is doing some computation at the same time).

(Plus, it is easy to interface C code to Python, so I can do all the heavy work in C anyway.)

Even C/C++ code I write for microcontrollers (C on bare metal, C++ in Arduino environment) tends to be somewhat portable. Obviously, parts that fiddle directly with the hardware change from one microcontroller to another, but to port the code, I usually only have to rewrite those hardware-facing parts. I do not like the idea of relying on any single vendor; I fundamentally prefer the opportunity for competition instead.

So, in my case, the answer is no, my code does not directly indicate what CPU is the most appropriate to run the application, on any of the programming languages I use. There are parts that I optimize or port to each CPU or CPU family, but that's about it. Even the OS is pretty irrelevant, as long as it supports POSIX.1-2008 in a practical manner; most of my code runs equally well on Linux, Mac, and BSD variants. Oh, and aside from libraries or code used by a Python user interface, I do not use Windows, or even care if my code runs in Windows or not. It is just not a relevant OS for me at all, being special and unlike anything else.

Do note, however, that a lot of C and even POSIX C code I encounter, is tied to a particular word size (typically expecting int and pointers to be the same size, which is not true on all architectures). It does take a bit of care (to not make certain assumptions) to write easily portable code like I do. It is common in current GNU tools and utilities (because they work on both ILP32 and LP64 architectures), but varies a lot in other projects. (I too learned this "the hard way", by having written code I wanted to work on both 32-bit and 64-bit architectures without kludges.)

I do know of special hardware that is particularly suited for a specific programming language, but I don't have any of those.

T3sl4co1l · « **Reply #4 on:** June 19, 2019, 02:21:24 pm »

Quote from: IconicPCB on June 19, 2019, 02:52:20 am

From a software onlooker...

Is it reasonable to expect that the development platform/language/ libraries used in writing an application will result in a preference for a CPU in terms of number of cores/ threads etc?

I hope I am showing my ignorance at a sufficiently low level so that an appropriate education will follow on.

To what end?

* A development platform has a given CPU, configuration, and usually toolchain associated with it.
* A language is not associated with any particular CPU, but will be most popular (most commonly used, best supported) on a subset of them.
* Libraries that are compiled binaries, are limited to a family at least, if not a specific CPU/MCU part. Library sources can be compiled to any target, given the same drawbacks as above.
* If you're looking at multicore CPUs and threads, you're automatically looking at a "supercomputer"*. A high speed (GHz) CPU core, coupled to multiple levels of caches, repeated a few times (multicore, SMP more specifically), plus at least a few hundred megs of RAM and storage (preferably gigs), and preferably high speed network, peripheral or other connections.

*I like to think of things in ~80s terms: back then, a PC was little more than an embedded CPU.[1] A workstation[2] was powerful enough to do things we consider normal PC activity today, while mainframe sized supercomputers[3] had the pure cranking power we consider normal today, but for very special-purpose applications due to their cost.

[1] 8 or 16 bit, low MHz, and 64k to 1MB address space (mostly RAM, as PCs need it), with enough peripherals bolted on, and software available, to be reasonably useful. Nowadays, we might compare to an AVR8, PIC, STM8, or MSP430, or, heh, well, MC68k is still available to this day but not so mainstream, and I forget what else is common in this space right now. These are all MCUs, so have a few peripherals in common (interrupt controllers, timers, serial and parallel IO), but have much less related to storage, or display (no FDC/HDC, no graphics), and have much less RAM, but much more ROM typically (a PC might've had 32k or so of EPROMs between its BIOS and expansion cards; this much Flash is common among MCUs).

[2] Workstations were usually 32 bit, sometimes 64 (give or take just when things were introduced), operating in the mid 10s of MHz (say 20-50MHz), had megs of RAM, high speed networking and storage (SCSI, Ethernet, etc.), high resolution graphics (low-color for CAD, or high-color for photo/video). Today, this sort of space is filled by the more upscale ARM MCUs, which often include fractional-meg onboard RAM (but support many megs of external RAM), a meg or so of Flash, and support USB2 or 3, Ethernet, LCD panel graphics and more. They're also often operated with an OS of some sort; Cortex M4 and up have an MMU so can run Linux. Performance is also comparable to PCs of the 90s -- an ARM may be simpler than a Pentium, but at 240MHz versus 90MHz, it's comparable or better!

[3] Take the Cray-1 for example. Liquid-cooled beast of a machine, 64 bit, 80MHz, 8 MB RAM, floating point, vector instructions (meaning, linear algebraic operations like dot and cross products, or matrix arithmetic, are particularly easy), up to 160MFLOPS (with later versions through the 80s pushing over 1GFLOPS). This compares with, probably late 90s/early 00s PCs with graphics accelerators, but those were very special purpose chipsets still, until general purpose vectorization and GPGPUs were introduced in the mid-00s. Or modern CPUs in PCs or cellphones.

Hmm, that's not really a good example of what I wanted to make a point about. I should really be using Cray X-MP (multiprocessor) which was dual core. Massively multicore systems didn't take over until the 90s or 00s I think, but have always been around -- LINKS-1 for example, or the Connection Machine. (A CM-5 topped the list of supercomputers in 1993, but I don't see anything comparing the CM-1 or other earlier parallel machines?)

Anyways, the interesting thing is not just that computers get bigger and more complex over time, but that the simplest computers have never disappeared. There will always be an application for the dumb 8-bit CPU (or 4 bit even!). It is interesting that 32-bit machines are now so cheap and common that they're displacing 8- and 16-bit machines (e.g., ARM M0 instead of PIC or AVR), but they haven't fully replaced them, at least yet. (But you may well opt for, say, an STM32F0 over an ATMEGA328, for most of your low-power embedded applications, and consider alternatives when and if the product moves into such quantity that a potentially cheaper chip can be used.)

So, if you want to do high level development, on a comfortable, powerful platform, just pick up any old PC -- a PC as such, or a rasPi, or a tablet, or even a cellphone (well, maybe not the easiest interface to dev with..). OS probably Linux, and any languages that do what you want -- C/C++ for boring stuff, say Python for general tasks, Octave for vector math and scripting, etc.

If you want to do development for anything else, you'll probably still be based on a PC, but cross-compiling for that target. A programming dongle is usually needed, but bootloaders can offer this function through traditional ports (e.g. USB or serial).

As you can see, a truly general answer is not easy to put together, and probably not all that useful in the end. The field of computing is massive and complex, and most practitioners keep to their own little corner -- see the above replies for example. (Personally, I've mostly worked with 8086 and AVR8.)

If you can't make up your mind, you should consider what ends you want to reach, and try the languages that best support that end. You will inevitably get locked into that pattern, as we all do; that's a bit of a downside, but the alternative is becoming paralyzed and not doing anything at all -- clearly a worse outcome! So just accept it, and specialize in what is most useful to you.

Tim

tggzzz · « **Reply #5 on:** June 19, 2019, 02:33:25 pm »

Quote from: T3sl4co1l on June 19, 2019, 02:21:24 pm

* If you're looking at multicore CPUs and threads, you're automatically looking at a "supercomputer"*.

I can get such a supercomputer[1] for £25 (+P&P), one off

It is a 4000MIPS 32 core MCU: https://www.digikey.co.uk/product-detail/en/xmos/XEF232-1024-FB374-C40/880-1107-ND/5358020

Buy two for a 64 core machine (they connect together very simply, both hardware and software), and the P&P will be free

[1] Aren't definitions fun

ptricks · « **Reply #6 on:** June 19, 2019, 03:01:35 pm »

You mention CPU cores and threads which generally is going to depend heavily on what the application is,whether it will benefit from more cores and threads, or just a faster clock speed and single threads.
My workflow has been something like:
What is the problem that needs solving ?
What is the easiest solution to the problem in hardware ?
Can I program and work with that hardware or does it warrant learning a new platform ? I don't try to fit a project into hardware I know just because that is what I use (a hammer can pound a screw into a board but a screwdriver works better)
Choice of language comes last because depending on what platform I choose I may have to use what is available and whatever the SDK for that platform consist.

T3sl4co1l · « **Reply #7 on:** June 19, 2019, 03:57:10 pm »

Quote from: tggzzz on June 19, 2019, 02:33:25 pm

I can get such a supercomputer[1] for £25 (+P&P), one off It is a 4000MIPS 32 core MCU: https://www.digikey.co.uk/product-detail/en/xmos/XEF232-1024-FB374-C40/880-1107-ND/5358020

Yes, we get it, you're a fanboi.

Hence my point about domain-specific knowledge -- ask a dozen people and you'll get a dozen-and-one opinions, all very different, and none of them necessarily exclusive.

Personally, I solve real-time problems in the analog domain; I have no use of XMOS. My projects are mostly simple enough to allow that. Sometimes you do need an FPGA, or a multicore system like XMOS. Others, a DSP (high memory bandwidth and IPS/FLOPs; often subject to pipelining restrictions), or vector system is best. (Hm, I don't know of any vector CPUs offhand, without going straight to a full GPU as such?) Most things can be solved by brute force by a single core or few, i.e., throw it on a rPi, or PC or whatever, and deal with the nondeterministic latency with buffering or whatever.

Quote from: ptricks on June 19, 2019, 03:01:35 pm

You mention CPU cores and threads which generally is going to depend heavily on what the application is,whether it will benefit from more cores and threads, or just a faster clock speed and single threads.

To further my point about class of computer -- you don't even think about cores and threads, until roughly a 90s (80386+) level of technology. There's just not much point for simpler processors; or necessarily much point even for more advanced processors, if you're solving a hard, but simple, problem.

Threads are a hardware construct on some processors (e.g. HyperThreading), but mostly they're an OS construct. So mentioning threads at all, presupposes an OS that provides them, and a CPU that can run that OS in turn. Historically, this is usually a 32-bit+ CPU with MMU. 80386 and Cortex M4 for example.

Or mid-90s MIPS -- if you've ever seen a crash on an N64 you may've seen some debug output including something about threads. This shows that the value of a thread-based system extends to statically linked programs (ROM cartridge!), without the overhead of an active OS supporting it.

In short, threading is a useful paradigm for systems beyond a certain level of complexity. They're great where you need multiple functions, running more-or-less concurrently. Used properly, they prevent any one code path from stalling the whole system. Whereas if you have to bake execution order into a single-threaded program, you create very complex code paths, for which a hangup on any one causes the whole program to hang, and for which there are aren't many ways to express such complicated execution paths (aside from the much-loved GOTO

).

Not that threads are necessarily trivial to use, either: you need atomic operations, blocking, mutexes, and for which you get hazards of deadlock, starvation, etc...

You can make threaded or multitasking sorts of environments even on very simple platforms (e.g., System 6 (on 68k), Windows 3.0 (8086)?), but it's potentially a lot of overhead, and the lack of memory management means any thread can corrupt the state (memory) of any other thread -- it's very fragile. Again, hardly a problem for some systems (like that statically-linked ROM cartridge), but again, it depends what you're doing.

Tim

Nominal Animal · « **Reply #8 on:** June 19, 2019, 04:19:51 pm »

Not just latency/buffering on RPi: Completely nonstandard packet loss too, even for USB devices, due to hardware limitations.

tggzzz · « **Reply #9 on:** June 19, 2019, 04:28:02 pm »

Quote from: T3sl4co1l on June 19, 2019, 03:57:10 pm

Quote from: tggzzz on June 19, 2019, 02:33:25 pm
I can get such a supercomputer[1] for £25 (+P&P), one off It is a 4000MIPS 32 core MCU: https://www.digikey.co.uk/product-detail/en/xmos/XEF232-1024-FB374-C40/880-1107-ND/5358020

Yes, we get it, you're a fanboi.

Hence my point about domain-specific knowledge -- ask a dozen people and you'll get a dozen-and-one opinions, all very different, and none of them necessarily exclusive.

Personally, I solve real-time problems in the analog domain; I have no use of XMOS. My projects are mostly simple enough to allow that. Sometimes you do need an FPGA, or a multicore system like XMOS. Others, a DSP (high memory bandwidth and IPS/FLOPs; often subject to pipelining restrictions), or vector system is best. (Hm, I don't know of any vector CPUs offhand, without going straight to a full GPU as such?) Most things can be solved by brute force by a single core or few, i.e., throw it on a rPi, or PC or whatever, and deal with the nondeterministic latency with buffering or whatever.

Tim

Yes, I am indeed a family of something that is different, within a sea of mediocrity that has barely changed since the 80s. Their differences throw the suboptimalities of traditional techniques into sharp relief.

Sometimes when people are obsessing over the trivial differences between this car and that car, it is good to be aware that cars, boats, and bicycles exist.

Are they a good solution to all objectives? As someone that has developed low noise analogue electronics, RF, digital, hard and soft realtime systems from bare metal to telecoms billing systems, and more - I know they are not!

But merely knowing what can be achieved, and how, can improve designs and implementations using traditional technologies.

So I'm unrepentant

tszaboo · « **Reply #10 on:** June 19, 2019, 04:36:06 pm »

If you think about it, CPU manufacturers spend big money on compilers, kernel writing and so on. For example with some software tweaks in the OS scheduler, they can achieve 10-20% better performance. They need to write specific code for the sheduler for their specific CPU family. Think about it this way:
If intel 6th gen i5 do this and this
If intel 6th gen i7 do this and this, cause it has hyper threading, and you dont want to load the fake cores with important tasks
If intel 6th gen i7 with fixed code for Spectre and other flaws than do this

and so on. You dont run code on bare metal anymore, so the question doesnt make sense. The entire ecoystem has to be evaluated.

tggzzz · « **Reply #11 on:** June 19, 2019, 04:46:19 pm »

Quote from: NANDBlog on June 19, 2019, 04:36:06 pm

You dont run code on bare metal anymore, so the question doesnt make sense. The entire ecoystem has to be evaluated.

The ecosystem is indeed at least if not more important than any single component.

However, many applications do run code against bare silicon. It all depends on the application, and few generalisations are valid.

IconicPCB · « **Reply #12 on:** June 20, 2019, 07:26:37 am »

Thank You all for Your comments.
The reasoning behind my question was to ascertain whether there is an optimum CPU to run an application written in Microsoft C++ VC, on windows 10 ( a vision processing application).

tggzzz · « **Reply #13 on:** June 20, 2019, 07:35:14 am »

Quote from: IconicPCB on June 20, 2019, 07:26:37 am

Thank You all for Your comments.
The reasoning behind my question was to ascertain whether there is an optimum CPU to run an application written in Microsoft C++ VC, on windows 10 ( a vision processing application).

Shame you didn't bother to state that in your OP. It would have saved a lot of other people's time and effort.

NivagSwerdna · « **Reply #14 on:** June 20, 2019, 08:25:39 am »

When GPU programming which can easily use 10000+ threads I still use a subset C but in the context of CUDA which is then translated to PTX etc. So as has been alluded to... it's the ecosystem that counts.
If you do not want to drop to the bare metal then often the hard stuff has already been abstracted away from you so the choices are wider and less significant.

magic · « **Reply #15 on:** June 20, 2019, 10:19:56 am »

Quote from: tggzzz on June 20, 2019, 07:35:14 am

Shame you didn't bother to state that in your OP. It would have saved a lot of other people's time and effort.

Some questions are best answered with questions

IconicPCB · « **Reply #16 on:** June 20, 2019, 10:47:03 am »

TGZZZ,

My apologies. The inept posting is a measure of my ignorance of subject matter.

ptricks · « **Reply #17 on:** June 20, 2019, 01:48:17 pm »

vision processing I would look at the intel movidius
https://www.movidius.com/

tggzzz · « **Reply #18 on:** June 20, 2019, 03:05:58 pm »

Quote from: IconicPCB on June 20, 2019, 10:47:03 am

TGZZZ,

My apologies. The inept posting is a measure of my ignorance of subject matter.

No apologies necessary; we all ask naive questions when we first approach a subject. But I do try to explain my circumstances in as much detail as I can; those are known to me.

For example, if asking about radios, I'd mention what I was using it for, distance between receiver and transmitter, amount of data, quantity, cost.

The first of those points is arguably the most important, for the reasons noted thus: https://entertaininghacks.wordpress.com/library-2/good-questions-pique-our-interest-and-dont-waste-our-time-2/

NorthGuy · « **Reply #19 on:** June 21, 2019, 04:32:42 pm »

Quote from: IconicPCB on June 20, 2019, 07:26:37 am

The reasoning behind my question was to ascertain whether there is an optimum CPU to run an application written in Microsoft C++ VC, on windows 10 ( a vision processing application).

You can optimize your code for a specific CPU.

If you have many cores, you can take advantage of that (for example split your picture into squares and process each square independently in its own thread). However, if you have only 1-2 cores, such approach will be huge performance drag.

If you have fast memory (or 4-channel memory, such as in i7-7800) it might be OK to access pixels from a huge picture randomly. However, with slower memory, you must localize your memory accesses to make sure everything is cached. Or, perhaps, if your pictures are small, this is not a factor for you at all.

CPU performance features only help if you manage to take advantage of them. For example, if you have a CPU with AVX512, you can use AVX512 wisely, and this may help you to do lots of operations in parallel, but it will also slow down the CPU by almost 50%, so it might be a tough choice whether to use it or not.

If you only do library calls, what you can do is limited by what the library provides, thus you cannot optimize your process beyond what the library has done for you already.

westfw · « **Reply #20 on:** June 23, 2019, 08:13:41 am »

Quote

Is it reasonable to expect that the development platform/language/ libraries used in writing an application will result in a preference for a CPU in terms of number of cores/ threads etc?

I would think that it would normally work the other way around. Given a particular target CPU (or class thereof), and a particular problem, I'd pick the platform/language/libraries. In general, parallelization for multiple cores is not handled very automatically, except maybe for particular sub-problems. A fortran compiler might do a fine job of parallelizing big matrix math problems, but you probably wouldn't want to use it for computer vision or machine learning...

Siwastaja · « **Reply #21 on:** June 23, 2019, 03:45:14 pm »

Quote from: T3sl4co1l on June 19, 2019, 03:57:10 pm

Threads are a hardware construct on some processors (e.g. HyperThreading), but mostly they're an OS construct. So mentioning threads at all, presupposes an OS that provides them, and a CPU that can run that OS in turn. Historically, this is usually a 32-bit+ CPU with MMU. 80386 and Cortex M4 for example.

Do note that threading is fundamentally not that different compared to what you do with interrupts, even with the simplest 8-bit PIC or AVR (or when you wrote interrupt handlers on a PC in DOS era). Same constraints about unpredictable "parallel" execution and protecting state with mutex-like structures apply. The term "thread" is not used here, but it's very close, and same mental models work.

tggzzz · « **Reply #22 on:** June 23, 2019, 04:04:01 pm »

Quote from: Siwastaja on June 23, 2019, 03:45:14 pm

Quote from: T3sl4co1l on June 19, 2019, 03:57:10 pm
Threads are a hardware construct on some processors (e.g. HyperThreading), but mostly they're an OS construct. So mentioning threads at all, presupposes an OS that provides them, and a CPU that can run that OS in turn. Historically, this is usually a 32-bit+ CPU with MMU. 80386 and Cortex M4 for example.

Do note that threading is fundamentally not that different compared to what you do with interrupts, even with the simplest 8-bit PIC or AVR (or when you wrote interrupt handlers on a PC in DOS era). Same constraints about unpredictable "parallel" execution and protecting state with mutex-like structures apply. The term "thread" is not used here, but it's very close, and same mental models work.

Very true, and it is surprising how few people realise that.

I suspect it is a combination of only seeing things from a hardware point of view, plus (most) languages only having interrupt processing as a bolt-on afterthought.

NorthGuy · « **Reply #23 on:** June 23, 2019, 07:59:45 pm »

Quote from: Siwastaja on June 23, 2019, 03:45:14 pm

Do note that threading is fundamentally not that different compared to what you do with interrupts, even with the simplest 8-bit PIC or AVR (or when you wrote interrupt handlers on a PC in DOS era). Same constraints about unpredictable "parallel" execution and protecting state with mutex-like structures apply. The term "thread" is not used here, but it's very close, and same mental models work.

It's worse. On PIC and AVR, the ISR code can interrupt the "main" code by not wise versa. With a free-threading OS, there's no telling - everything can get interrupted at any time.

T3sl4co1l · « **Reply #24 on:** June 26, 2019, 03:21:21 am »

Indeed. Interrupts are often discussed in terms of "main() thread" vs "interrupt thread"!

You can make a task switcher in an interrupt and make a rudimentary RTOS yourself. But again, overhead makes it prohibitive -- say for AVR, having so many registers and so little RAM, you could maybe hold a hundred threads with no free RAM left to use!

The lack of thread-safe IO drivers is the other side of that, too. Ideally you'd abstract away IO registers via device drivers, with access explicitly through API calls including thread-locked access (e.g., how opening COM1 or dev/rtty1 for writing, blocks another thread from also locking it). And it would be nice to enforce that with privilege levels (restricts IO, RAM accesses) and an MMU (restricts and remaps memory spaces). And even nicer to have the extra CPU power to be able to do these things without totally bogging down. And...

Tim


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Language/ compiler and the most apropriate CPU to run the application. (Read 8405 times)

Share me