Author Topic: Advantages of 32bit micros  (Read 17413 times)

0 Members and 1 Guest are viewing this topic.

Online Howardlong

  • Super Contributor
  • ***
  • Posts: 5315
  • Country: gb
Re: Advantages of 32bit micros
« Reply #50 on: January 14, 2016, 06:36:30 pm »
Completely nuts, but yesterday I built a narrow band FM transmitter on the 2m band using a breadboarded PiC16F1718 and an 8MHz crystal. No more than about 40-50 lines of code. Looking to put it on a $0.53 (1ku) 8 pin PIC16F18313 when I get delivery next week.

The on-chip NCO has just about enough resolution for reasonable NBFM after filtering the 11th harmonic with some passives. I built it for use as a low power calibrated and modulated signal source, but equally it can be used as an exciter for any number of other projects.

If I'm honest I'd have preferred doing it on a 16 or 32 bit device, but when it comes to really low pin count and random peripherals like this it's hard to beat the 8 bit PICs.
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4196
  • Country: us
Re: Advantages of 32bit micros
« Reply #51 on: January 14, 2016, 07:09:35 pm »
Quote
Actually this can be made much shorter! You can store a constant relative to the PC (program counter) so reading the register value is like
ld r2,[pc+20]
Thumb2 allows some immediate 8 bit values:
or r2,r2,bit_value
And then store the new value.
st [pc+20],r2
Well, no.  There is a (rather small) range limit on the offset you can use pc-relative addressing, and it definitely doesn't extend to accessing the peripheral address space from the program flash space.
And while thumb2 gives you some immediate values, it does so by expanding the instruction size from 16 to 32 bits, so it still takes the same space as the "mov const to reg, or register" sequence (though it does save the register.)   And CM0/CM0+, where you care most about code space efficiency, doesn't have Thumb2 :-(
(Mind, you have to dig pretty deep through several levels of manuals to figure this out.  For a "RISC" architecture, there's a fair amount of ugliness hidden behind the "clean" instruction formats described at the assembly level.  Especially once you take into account the several currently-active "architecture variants."  Grr.)
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1570
  • Country: de
Re: Advantages of 32bit micros
« Reply #52 on: January 14, 2016, 07:21:18 pm »
Well, no.  There is a (rather small) range limit on the offset you can use pc-relative addressing, and it definitely doesn't extend to accessing the peripheral address space from the program flash space.
As there are 13 general purpose registers available for Cortex M3, you could sacrifice two registers to serve as base for offset based access to e.g. the SRAM or peripheral registers.
I'm not sure if this is actually used by GCC for Cortex cores, but e.g. for PPC cores there was/is a convention that certain registers are not used during runtime as they hold the address of some "small data areas".
Trying is the first step towards failure - Homer J. Simpson
 

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4067
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: Advantages of 32bit micros
« Reply #53 on: January 14, 2016, 07:25:07 pm »
You don't need to keep the offset, you can store the full address.
The compiler does not need them all. You can "demand" a few for your own use, even in C, by using the keyword register. (the optimiser will automatically do this for you on O1).
 

Offline c4757p

  • Super Contributor
  • ***
  • Posts: 7799
  • Country: us
  • adieu
Re: Advantages of 32bit micros
« Reply #54 on: January 14, 2016, 08:23:59 pm »
register is just a suggestion¸ it's generally ignored - and you have to put the compiler into -Opotato to make it not do this automatically anyway.

Anyone who uses register is micro-optimizing. Structure your code better and the compiler will do it for you.
No longer active here - try the IRC channel if you just can't be without me :)
 

Online Howardlong

  • Super Contributor
  • ***
  • Posts: 5315
  • Country: gb
Re: Advantages of 32bit micros
« Reply #55 on: January 14, 2016, 10:40:02 pm »
Anyone who uses register is micro-optimizing. Structure your code better and the compiler will do it for you.

I'd agree that randomly using register is wishful thinking.

In general you need to have a reasonably good understanding of the CPU and bus/memory architecture of your device to do a better job than the compiler (for example it doesn't know which busses are slow and where there may be contention), but occasionally it makes sense to help it out. Sometimes manual loop unrolling, using idioms and interleaving assignments to avoid stalls for example can do a better job than the compiler as it makes generalised assumptions about data parameters that you have more information than you can tell it about. But yes, it is very much the exception than the rule, and you are very much at risk of making unmaintainable code.

 

Offline andyturk

  • Frequent Contributor
  • **
  • Posts: 895
  • Country: us
Re: Advantages of 32bit micros
« Reply #56 on: January 15, 2016, 01:14:08 am »
I'd agree that randomly using register is wishful thinking.

Yeah, you gotta do something like this to make it stick:
Code: [Select]
      static inline ContextSwitchFrame* __attribute__((always_inline)) save() {
        ContextSwitchFrame* result asm("r0");

        // r4-r11 must be saved according to the Arm Procedure Call Standard (APCS)
        // http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
        //
        // r12 is the intra-procedure call scratch register, and is not saved
        // r2 & r3 are filled with two other values that must be saved with the context
        // r2-r11 (10 registers) can then be saved with a single instruction

        asm volatile("mrs   r0, psp              \n"
                     "mrs   r2, control          \n"
                     "mov   r3, lr               \n"
                     "stmdb r0!, {r2-r11}        \n"
                     : // output
                       "=r"(result)
                     : // input
                     : // clobber
                       "r0", "r2", "r3", "memory");
        return result;
      }
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4196
  • Country: us
Re: Advantages of 32bit micros
« Reply #57 on: January 15, 2016, 02:42:18 am »
Quote
As there are 13 general purpose registers available for Cortex M3, you could sacrifice two registers to serve as base for offset based access to e.g. the SRAM or peripheral registers.
The maximum offset for baseReg/offset addressing is only 4095; often peripherals are further apart than that (I had two CM3/4 datasheets handy.  The TI ARM3s have their GPIO ports spaced 4k apart.  An Atmel SAM3X has all the GPIO ports within a single 4k block.  huh.) (This is a good example of how the 32bit address space is both blessing and curse.  You COULD fit a lot of peripherals inside 4k, but chips generally don't.)
Using a 12bit offset (instead of 0..1020), or registers beyond r0..r7 pushes you into thumb2 territory.  32bits long and not available on CM0...
 

Offline theatrus

  • Frequent Contributor
  • **
  • Posts: 352
  • Country: us
Re: Advantages of 32bit micros
« Reply #58 on: January 15, 2016, 03:50:08 pm »

register is just a suggestion¸ it's generally ignored - and you have to put the compiler into -Opotato to make it not do this automatically anyway.

Anyone who uses register is micro-optimizing. Structure your code better and the compiler will do it for you.

There is one C keyword which can be useful and needs to be specified in certain cases as the C standard assumes a lax default: restrict.

In some C code forcing nothing to alias via a compiler flag would be disastrous. https://en.m.wikipedia.org/wiki/Restrict
Software by day, hardware by night; blueAcro.com
 

Offline nctnico

  • Super Contributor
  • ***
  • Posts: 26755
  • Country: nl
    • NCT Developments
Re: Advantages of 32bit micros
« Reply #59 on: January 15, 2016, 03:58:10 pm »
Quote
Actually this can be made much shorter! You can store a constant relative to the PC (program counter) so reading the register value is like
ld r2,[pc+20]
Thumb2 allows some immediate 8 bit values:
or r2,r2,bit_value
And then store the new value.
st [pc+20],r2
Well, no.  There is a (rather small) range limit on the offset you can use pc-relative addressing, and it definitely doesn't extend to accessing the peripheral address space from the program flash space.
Small oversight indeed... but this is how a decent C compiler for ARM would solve it by putting a constant with the register's address somewhere close to the function:
ld r1,[pc+20] #get pointer to register's address 20 is just fictional
ld r2,[r1] #read value
or r2,r2,bit_value #modify
st [r1],r2 #write value

That is 8  bytes worth of instructions.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1570
  • Country: de
Re: Advantages of 32bit micros
« Reply #60 on: January 15, 2016, 05:57:53 pm »
Quote
As there are 13 general purpose registers available for Cortex M3, you could sacrifice two registers to serve as base for offset based access to e.g. the SRAM or peripheral registers.
The maximum offset for baseReg/offset addressing is only 4095; often peripherals are further apart than that (I had two CM3/4 datasheets handy.  The TI ARM3s have their GPIO ports spaced 4k apart.  An Atmel SAM3X has all the GPIO ports within a single 4k block.  huh.) (This is a good example of how the 32bit address space is both blessing and curse.  You COULD fit a lot of peripherals inside 4k, but chips generally don't.)
Using a 12bit offset (instead of 0..1020), or registers beyond r0..r7 pushes you into thumb2 territory.  32bits long and not available on CM0...
Indeed I was too lazy to look up the according opcodes for Thumb-2. In the PPC world cores like e200z7 have load/store instructions with a signed 15bit offset, so with one small data area (SDA) pointer you can address 64k.
Trying is the first step towards failure - Homer J. Simpson
 

Offline bson

  • Supporter
  • ****
  • Posts: 2265
  • Country: us
Re: Advantages of 32bit micros
« Reply #61 on: January 15, 2016, 09:53:58 pm »
The maximum offset for baseReg/offset addressing is only 4095; often peripherals are further apart than that
Just bite the bullet and have each device instance have its own configurable base address.  With a 12 bit offset limitation the compiler has to emit these constants anyway.  4 bytes is a small price to pay for the flexibility and namespace hygiene it offers.  Something like this (just a rough stab at an example I typed up for this post):

Code: [Select]
// Example

#ifndef __NXP_GPIO_H__

enum BaseAddr

namespace nxp_io {

// Non-banded GPIO.  Subclass and replace to utilize banding when
// available.  All inlinable, simple ops.

class GpIo __novtable {
public:
  struct bits __packed {
    uint16_t _enable;           // Pin enables
    uint16_t _ub_in;            // Unbuffered input
    uint16_t _l_out;            // Latching out
  };

protected:
  struct bits& _base;

public:
  Gpio(uintptr_t base_addr)
    : _base(*(struct bits*)base_addr) {
  }

  void set(uint8_t bitnum) { _base._l_out |= 1<<bitnum; }   // rmw
  void clear(uint8_t bitnum) { _base._l_out &= ~(1<<bitnum); }

  // Note this is const.  Some devices are volatile in that they clear
  // non-persistent status bits, but gpio reads are const.  This
  // yields undefined behavior if another thread or context
  // (e.g. interrupt) is racing to set/clear, but that's actually a
  // reasonably expected behavior.
  bool test(uint8_t bitnum) const { return (_base._ub_in & (1<<bitnum)) != 0; }
};


} // ns nxp_io

#endif // __NXP_GPIO_H__



#include "board_conf.h"
// the above implies:
//    using namespace nxp_io;

// the io_config contains a set of enums with base addresses, processor dependent and
// imported from the board config.

Gpio port_a_output(io_config::GPIO_A_BASE_ADDR);
Timer timer_1(io_config::TIMER_1_BASE_ADDR, 1000); // 1kHz prescale


int main() __noreturn {
  for (;;) {
    port_a.set(io_config::led_blinker);
    timer_1.delay(500);         // In prescale units
    port_a.clear(io_config::led_blinker);
    timer_1.delay(500);
  }

  // not reached
}
 

Offline Bruce Abbott

  • Frequent Contributor
  • **
  • Posts: 627
  • Country: nz
    • Bruce Abbott's R/C Models and Electronics
Re: Advantages of 32bit micros
« Reply #62 on: January 15, 2016, 11:41:08 pm »
a decent C compiler for ARM would solve it by putting a constant with the register's address somewhere
A decent ARM chip is so fast and has so much memory that the 'inefficiency' isn't an issue.

You could try tweaking a poor compiler to make better code, or bite the bullet and write it all in optimized assembler, but the whole point of a compiler is to get away from that sort of thing. Just accept that RISC architecture needs more instructions to do the job, and don't worry about it.

8 bit PICs are a bit more code efficient than AVRs, but their instruction cycle timing is 4 times slower at the same clock speed. ARM needs even more cycles to do the job, But AVR tops out at 20MHz, so even a 'slow' ARM can beat it in most applications.   
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4196
  • Country: us
Re: Advantages of 32bit micros
« Reply #63 on: January 16, 2016, 02:24:34 am »
Quote
ld r1,[pc+20] #get pointer to register's address 20 is just fictional
ld r2,[r1] #read value
or r2,r2,bit_value #modify
st [r1],r2 #write value

That is 8  bytes worth of instructions.   
The "or" instruction is 4 bytes, and the first load has a 4-byte constant (in program space), so I'd call it 14 bytes...
And it's almost exactly why I had in my first message about this, except it doesn't do any atomicity handling.
real code (with real peripherals) is more likely to look like (10-12bytes):
Code: [Select]
ldr     r3, [pc, #8]    ; 32-bit constant gpio base.
mov     r2, #bitvalue    ; could be 2 or 4 bytes.
str     r2, [r3, GPIO_BITSET] ; store to bitset reg.


Quote
4 bytes is a small price to pay for the flexibility and namespace hygiene it offers.
Quote
A decent ARM chip is so fast and has so much memory that the 'inefficiency' isn't an issue.
Quote
the whole point of a compiler is to get away from that sort of thing.
Yes, to all of those.  And my point (if I have one) is not to complain about the ARM architecture or 32 bit efficiency; I'm just pointing out that the 32bit address space does have pretty direct consequences on "code density."   It's the sort of thing that makes an 8-bit programmer shake their head, and it becomes a bit worrisome when ARM manufacturers start offering really cheap "8-bit replacements" that DON'T have "so much memory", and whose libraries still assume you do.  And it's a bit annoying to have an assembly language that is so ... difficult to predict, in what is supposed to be a RISC environment  (admittedly, things get cleaner if you use a real 32-bit ARM instructions, and not thumb/thumb2 "compressed" instructions.)

The other interesting thing is ... you have to think about how you actually DO GPIO/etc.   That single-instruction bitset in AVR is just fine for a constant bitvalue on a constant pin.  But it doesn't work if either one is a variable.  By the time you get to an arduino-like digitalWrite(pin, val) function where everything is variable and there's some mapping involved, the AVR code has gotten "bloated and slow" and complicated, while the ARM code is ... about the same (and it's now smaller than the AVR code.)
 

Offline dannyf

  • Super Contributor
  • ***
  • Posts: 8221
  • Country: 00
Re: Advantages of 32bit micros
« Reply #64 on: January 16, 2016, 12:21:40 pm »
Quote
A decent ARM chip is so fast and has so much memory that the 'inefficiency' isn't an issue.

That's generally true for modern mcus in most applications, including the 8-bit mcus, in the sense that 1) speed / memory is generally not the constraining factor; and 2) if it is, you can always step up to a bigger chip.

It is a fool's errand to pick a generic mcu based on its core or data width. What distinguishes one mcu from another is its peripherals, vendor support (hardware + software), corporate sourcing strategies and price if you are unfortunate.

This "ARM" thing is a fad. A good fad from an ARM investor's point of view, :)
================================
https://dannyelectronics.wordpress.com/
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1570
  • Country: de
Re: Advantages of 32bit micros
« Reply #65 on: January 16, 2016, 01:35:36 pm »
The other interesting thing is ... you have to think about how you actually DO GPIO/etc.   That single-instruction bitset in AVR is just fine for a constant bitvalue on a constant pin.  But it doesn't work if either one is a variable.  By the time you get to an arduino-like digitalWrite(pin, val) function where everything is variable and there's some mapping involved, the AVR code has gotten "bloated and slow" and complicated, while the ARM code is ... about the same (and it's now smaller than the AVR code.)
As pointed out before, on Cortex M3 and above you can (usually) use bit banding on GPIOs. Well, it the GPIO registers are in the bit banding area - but they usually are.
However my understanding of bit banding is that internally, it's still a read/modify access on the whole register while using set/clear registers will really only affect the bit.
So while the read/write instruction for bit banding is still single-cycle, internally a bit banding write results in two back-to-back bus accesses and a read typically creates a stall cycle.
Trying is the first step towards failure - Homer J. Simpson
 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: Advantages of 32bit micros
« Reply #66 on: January 16, 2016, 01:47:53 pm »
The other interesting thing is ... you have to think about how you actually DO GPIO/etc.   That single-instruction bitset in AVR is just fine for a constant bitvalue on a constant pin.  But it doesn't work if either one is a variable.  By the time you get to an arduino-like digitalWrite(pin, val) function where everything is variable and there's some mapping involved, the AVR code has gotten "bloated and slow" and complicated, while the ARM code is ... about the same (and it's now smaller than the AVR code.)
As pointed out before, on Cortex M3 and above you can (usually) use bit banding on GPIOs. Well, it the GPIO registers are in the bit banding area - but they usually are.
However my understanding of bit banding is that internally, it's still a read/modify access on the whole register while using set/clear registers will really only affect the bit.
So while the read/write instruction for bit banding is still single-cycle, internally a bit banding write results in two back-to-back bus accesses and a read typically creates a stall cycle.

Cortex M4: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/Behcjiic.html
"The processor does not stall during bit-band operations unless it attempts to access the System bus while the bit-band operation is being carried out."

Cortex M3: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100165_0201_00_en/ric1417773736773.html
"The processor does not stall during bit-band operations unless it attempts to access the System bus while the bit-band operation is being carried out."
 

Offline 0xdeadbeef

  • Super Contributor
  • ***
  • Posts: 1570
  • Country: de
Re: Advantages of 32bit micros
« Reply #67 on: January 16, 2016, 05:02:07 pm »
Hm, I'm not quite sure why you quote that as it's less precise than what I wrote above. But just to show that I can quote infocenter.arm.com as well (but beware: that site tends to semi-crash my Firefox):

Quote
The bit-band operation to achieve the same result requires only a single instruction and one or two processor clock cycles.

For a read, the processor executes a load from the alias address, which the hardware converts into a load from the bit-band address. In most cases this includes a stall cycle while the processor waits for the loaded data value to be returned on the bus. However, if the next instruction to be executed is a NOP, this data transfer phase of the load completes while the NOP is executing. The masking and shifting takes place in the hardware with no additional latency, so the required bit appears in bit[0] of the target register when the load completes. Therefore a bit-band read, like a normal memory load, typically executes in two cycles in a zero wait-state memory system.

For a write, the processor executes a store to the alias address. For bufferable memory, a normal store completes in one clock cycle of the processor pipeline while the write buffer manages completion of the bufferable store on the bus. However, for a bit-band write, the hardware converts this into two back-to-back bus accesses, a load from the bit-band address immediately followed by a store to the same address. Manipulation of the required bit is handled in hardware between the load and the store operation with no additional latency. So the processor executes the bit-band store in one cycle, and memory is updated with the modified value two cycles later.
Trying is the first step towards failure - Homer J. Simpson
 

Offline Kalvin

  • Super Contributor
  • ***
  • Posts: 2145
  • Country: fi
  • Embedded SW/HW.
Re: Advantages of 32bit micros
« Reply #68 on: January 16, 2016, 05:20:28 pm »
Sorry, I should have bolded the text I was referring to.

The other interesting thing is ... you have to think about how you actually DO GPIO/etc.   That single-instruction bitset in AVR is just fine for a constant bitvalue on a constant pin.  But it doesn't work if either one is a variable.  By the time you get to an arduino-like digitalWrite(pin, val) function where everything is variable and there's some mapping involved, the AVR code has gotten "bloated and slow" and complicated, while the ARM code is ... about the same (and it's now smaller than the AVR code.)
As pointed out before, on Cortex M3 and above you can (usually) use bit banding on GPIOs. Well, it the GPIO registers are in the bit banding area - but they usually are.
However my understanding of bit banding is that internally, it's still a read/modify access on the whole register while using set/clear registers will really only affect the bit.
So while the read/write instruction for bit banding is still single-cycle, internally a bit banding write results in two back-to-back bus accesses and a read typically creates a stall cycle.

Cortex M4: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/Behcjiic.html
"The processor does not stall during bit-band operations unless it attempts to access the System bus while the bit-band operation is being carried out."

Cortex M3: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100165_0201_00_en/ric1417773736773.html
"The processor does not stall during bit-band operations unless it attempts to access the System bus while the bit-band operation is being carried out."

I interpreted that you were discussing setting and clearing individual bits using the bit-banding operations (read/modify access). Thus the quotations to the reference material and it should be a single cycle action without stall.

Reading the bit value from the bit-banding area and the using its value in the next instruction will create a stall.
« Last Edit: January 16, 2016, 05:36:55 pm by Kalvin »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf