The purpose is for a design that currently uses F334 M3's to be replaced with a cheaper G071 M0+'s or one H750 replacing all M3's or M0+'s but then that complexifies the entire design yet computing power can now be distributed rather then be fixed + other bonuses as per multi M3 or M0+ dont give yet those give way simpler board layout and etc.
From a previous m0 vs m3/m4 comparison I made recently, I had decided for the m3/m4 core. Reason: the main decoding algorithm that I wrote has a tight loop with if-else statements in them, and ARM m3/m4 has something called conditional execution. This basically inlines an if-else statement with an "if-then-else" instruction, which can perform these control flow tasks without program jumps. I put this small piece of code into the online compiler explorer, and saw that the M0 code looked like a horrid mess with 4 jumps per iteration, while the m4 code was basically only 6 instructions per iteration.
Another advantage is that the m3/m4 supports a bigger ISA. The m0 is limited to Thumb2, and as such it may need to use multiple instructions to execute something that a single 32-bit opcode can handle. Note that the M0 is a different ARM architecture than M3, and M4&M7 yet again. (ARMv6-M, ARMv7-M, ARMv7E-M, respectively).
Nonetheless, these may be very theoratical reasons to explain the much higher CoreMarks score for the M3/M4 cores. I would recommended comparing it for your particular application.
However if your MCU firmware is mostly juggling I/O registers and not so much number crunching or complex protocol stacks, then I think you could get by with a m0 chip. (I would if I would redesign the aforementioned design into a FPGA with m0 softcore and hardware accelerator for the encoder/decoder)
M4 yes , sorry for that, Anyhow as noted ARM white papers have more detailed "fancy" numbers per Mhz but as mentioned that tells me just that, imagine just the case of interrupt , blocking interrupts/DMA etc combined with code structure not to mention interrupt handling have different clock lengths depending on core makes the whole task of estimating very problematic. I had hoped to find some code cases from ARM, ST etc but nothing. I doubt bench marking code is real app comparable.
My imagined lame idea was is that if one had a routine for the M0+ then the same routine for i.e M4 and M7 and
compare that but that suggest detailed study of disassembled code and absolute control of the compilation or so.
M4 yes , sorry for that, Anyhow as noted ARM white papers have more detailed "fancy" numbers per Mhz but as mentioned that tells me just that, imagine just the case of interrupt , blocking interrupts/DMA etc combined with code structure not to mention interrupt handling have different clock lengths depending on core makes the whole task of estimating very problematic. I had hoped to find some code cases from ARM, ST etc but nothing. I doubt bench marking code is real app comparable.
My imagined lame idea was is that if one had a routine for the M0+ then the same routine for i.e M4 and M7 and
compare that but that suggest detailed study of disassembled code and absolute control of the compilation or so.
Short of testing that specifically with your OWN code and specific environment, benchmarks are pretty much all you can count on to get realistic figures. Of course they don't say it all, but CoreMark is rather balanced and not all that badly designed for this purpose. You were strictly talking about "computing capacity" (whatever you mean by that), so again a benchmark of this kind seems somewhat relevant. An industry-standard benchmark is certainly better than just theoretical and potentially uninformed hypotheses.
Now there are of course myriads of potential particular cases that would make a general-purpose benchmark less relevant, but then it's completely up to you. Either you use a general-purpose tool, or your very own. What else do you want? Theoretical values are pretty much meaningless as every implementation will define real-life performance, including the use of cache and other factors. The approximate 1.7x factor between STM32's M0 and M4 for "computing-intensive" tasks looks about right to me. Of course this is just a very general figure. If you're heavily using floating point for instance, the difference would be much more drastic. The F334 has an FPU, the G071 doesn't AFAIK... Another architectural point: the M4 has speculative branch, the M0+ doesn't.
As to interrupts, I would expect the M0+ to have slightly less latency actually, because it's only a 2-stage pipeline core. That would have to be confirmed though, as stating this just from the architecture would typically be a case of what I said above: a theoretical and potentially uninformed hypothesis.