Author Topic: [solved] AVR-GCC generates ASM stuffed with bloatware use of reserved registers  (Read 3278 times)

0 Members and 2 Guests are viewing this topic.

Offline RoGeorgeTopic starter

  • Super Contributor
  • ***
  • Posts: 7154
  • Country: ro
Using Eclipse with avr-gcc (GCC) 5.4.0 and these compiling parameters (copied from Eclipse project settings):
-Wall -g2 -gdwarf-2 -O0 -fpack-struct -fshort-enums -ffunction-sections -fdata-sections -std=gnu99 -funsigned-char -funsigned-bitfields -mmcu=attiny13 -DF_CPU=9600000UL

I was trying to poke bit 4 in PORTB to 0 then 1 at max possible speed.  No optimization -O0 is intentional.  The MCU is an ATtiny13.  Whatever C syntax I try for flipping the PB4 bit, the compiler generates a lot of extra assembly code at each C line.

What's more intriguing is that this extra ASM instructions are handling some registers that are marked as reserved in the ATtiny13 documentation (for example reserved r24 and r25 - see page 7 of 22 in https://ww1.microchip.com/downloads/en/DeviceDoc/2535S.pdf ). The PORTB is r18.  For example (from the compiled .lss):
Code: [Select]
PORTB |= _BV(PB4); // set PB4 LED on
  98: 88 e3        ldi r24, 0x38 ; 56
  9a: 90 e0        ldi r25, 0x00 ; 0
  9c: 28 e3        ldi r18, 0x38 ; 56
  9e: 30 e0        ldi r19, 0x00 ; 0
  a0: f9 01        movw r30, r18
  a2: 20 81        ld r18, Z
  a4: 20 61        ori r18, 0x10 ; 16
  a6: fc 01        movw r30, r24
  a8: 20 83        st Z, r18
PORTB &= ~_BV(PB4); // PB4 LED off
  aa: 88 e3        ldi r24, 0x38 ; 56
  ac: 90 e0        ldi r25, 0x00 ; 0
  ae: 28 e3        ldi r18, 0x38 ; 56
  b0: 30 e0        ldi r19, 0x00 ; 0
  b2: f9 01        movw r30, r18
  b4: 20 81        ld r18, Z
  b6: 2f 7e        andi r18, 0xEF ; 239
  b8: fc 01        movw r30, r24
  ba: 20 83        st Z, r18

No matter what C code I was trying for poking the PB4 bit, the compiler never made use of the CBI/SBI assembler instructions:  https://onlinedocs.microchip.com/pr/GUID-0B644D8F-67E7-49E6-82C9-1B2B9ABE6A0D-en-US-19/index.html?GUID-DC827DBD-2D0E-4697-83A9-047661370309 By looking at the dissambled binary loaded into the MCU, all those reserved registers are indeed loaded with some numbers and handled apparently for no reason.  :-//



The code is a mess, adding it here only to spare the time of unziping the attached project.  The commented out C lines are other tries that also failed to produce the expected ASM instructions CBI/SBI register, bit.
Code: [Select]
// #define F_CPU (9600000UL)
// #define __AVR_ATtiny13__ __AVR_ATtiny13__

#include <avr/io.h>
//#include <avr/builtins.h>
#include <avr/iotn13.h>
#include <avr/interrupt.h>
#include <avr/sleep.h>
// #include <util/delay.h>

#define DELAY 200U
#define MAXCOUNT 16

static volatile int counter;

void WDT_off(void)
{
// cli();
//__watchdog_reset();
/* Clear WDRF in MCUSR */
MCUSR &= ~(1<<WDRF);
/* Write logical one to WDCE and WDE */
/* Keep old prescaler setting to prevent unintentional time-out */
WDTCR |= (1<<WDCE) | (1<<WDE);
/* Turn off WDT */
WDTCR = 0x00;
// __enable_interrupt();
}

ISR(WDT_vect) {
//blink
}

ISR(ANA_COMP_vect, ISR_BLOCK) {
//blink
//if (ACSR & _BV(ACO))
/*
if (ACSR & (1<<ACO))
PORTB |= _BV(PB4); // set PB4 LED on
else
PORTB &= ~_BV(PB4); // PB4 LED off
*/
/*
counter++;
if (counter >= MAXCOUNT)
{
PORTB ^= _BV(PB4); //toggle PB4
counter = 0;
}
*/
}

int main(void)
{
cli();
WDT_off();
DDRB |= _BV(PB4); //PB4 as Output

// ACSR – Analog Comparator Control and Status Register (default all 0)
// Bit 7 – ACD: Analog Comparator Disable
// Bit 6 – ACBG: Analog Comparator Bandgap Select
// Bit 5 – ACO: Analog Comparator Output (read only)
// Bit 4 – ACI: Analog Comparator Interrupt Flag
// Bit 3 – ACIE: Analog Comparator Interrupt Enable
// Bit 2 – Res: Reserved Bit
// Bits 1, 0 – ACIS1, ACIS0: Analog Comparator Interrupt Mode Select
// 00 - interrupt on toggle
// 01 - reserved
// 10 - interrupt on falling edge
// 11 - interrupt on raising edge

ACSR |= _BV(ACIE); // enable Analog Comparator interrupts
// disable watchdog timer
// ? clear AC int flag (ACI)
// enable AC interrupts

// enable global interrupts
//sei();
cli();
//sleep_mode();

// forever loop
for(;;)
{
//sleep_enable();

PORTB |= _BV(PB4); // set PB4 LED on
PORTB &= ~_BV(PB4); // PB4 LED off

PORTB |= _BV(PB4); // set PB4 LED on
PORTB &= ~_BV(PB4); // PB4 LED off

PORTB |= _BV(PB4); // set PB4 LED on
PORTB &= ~_BV(PB4); // PB4 LED off

///////////////////
struct bits {
  uint8_t b0:1;
  uint8_t b1:1;
  uint8_t b2:1;
  uint8_t b3:1;
  uint8_t b4:1;
  uint8_t b5:1;
  uint8_t b6:1;
  uint8_t b7:1;
} __attribute__((__packed__));
#define SBIT(port,pin) ((*(volatile struct bits*)&port).b##pin)
//bit defines
#define WR     SBIT(PORTB,4)
#define WR_DDR SBIT(DDRB,4)
//usage
//WR_DDR = 1;
WR = 1;
WR = 0;
//will result in sbi/cbi
WR = 1;
WR = 0;

WR = 1;
WR = 0;

///////////////////
// asm volatile (
// "SBI r18, 4" "\n\t");
// "CBI(PORTB, 4);");
/*
SBI(PORTB, 4);
CBI(PORTB, 4);

SBI(PORTB, 4);
CBI(PORTB, 4);
*/

///////////////////
/*
PORTB |= (1 << PB4);     // set pin 4 of Port B high
PORTB &= ~(1 << PB4);    // set pin 4 of Port B low

PORTB |= (1 << PORTB3);  // set pin 4 high again
PORTB &= ~(1 << PORTB3);  // set pin 4 high again

PORTB ^= (1 << PB4);  // set pin 4 high again
PORTB ^= (1 << PORTB4);  // set pin 4 high again
*/

///////////////////
/*
bit_is_set(PORTB, 4);
bit_is_clear(PORTB, 4);

bit_is_set(PORTB, 4);
bit_is_clear(PORTB, 4);

bit_is_set(PORTB, 4);
bit_is_clear(PORTB, 4);
*/

}
}

-O0 compiling option was chosen so because I want no C line mangled-out at optimization.  I need later to step through the C code line-by-line, with a hardware debugger.  I was expecting to see something like this in assembler, corresponding to set/reset of PORTB bit4:
Code: [Select]
sbi r18, 4
cbi r18, 4
...

Any chances to make the compiler produce something like that when poking at port bits?  And why is the compiler using those reserved registers r24 and r25?
« Last Edit: July 19, 2024, 08:33:32 pm by RoGeorge »
 

Offline jfiresto

  • Frequent Contributor
  • **
  • Posts: 909
  • Country: de
You may have to live with disappointment. The peak performance of avr-gcc code generation was somewhere around 3.4.6 – since then it has gotten dispiritingly bad.

-John
 

Offline RoGeorgeTopic starter

  • Super Contributor
  • ***
  • Posts: 7154
  • Country: ro
Turns out the bloatware instructions happens only when forcing the no optimization flag -O0.  Any other optimization level will generate the expected SBI/CBI assembler instructions for set/clear bit, and also won't make use of the reserved register.

I guess I'll have to give up to the idea of no code optimization.

Offline coppice

  • Super Contributor
  • ***
  • Posts: 10289
  • Country: gb
You may have to live with disappointment. The peak performance of avr-gcc code generation was somewhere around 3.4.6 – since then it has gotten dispiritingly bad.
Yep. AVR, MSP430 and I believe any other simple core's generated code became much worse from GCC 4 onwards. As GCC focussed more and more on complex cores, it lost the plot for simple cores. I was involved in GCC for the MSP430. After being pretty content working with various revisions of GCC 3, we couldn't find any way to coax better results out of GCC 4 and later.

GCC won't build on modern systems, producing a mass of errors. I wonder how much effort it would take to make it build again? I am not sure if you just need this older GCC, or if older versions of GDB or binutils are also needed for compatibility.
« Last Edit: July 19, 2024, 08:40:05 pm by coppice »
 

Offline djacobow

  • Super Contributor
  • ***
  • Posts: 1175
  • Country: us
  • takin' it apart since the 70's
You might want to try -Og

Quote
Optimize debugging experience. -Og should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation and a good debugging experience. It is a better choice than -O0 for producing debuggable code because some compiler passes that collect debug information are disabled at -O0.

Like -O0, -Og completely disables a number of optimization passes so that individual options controlling them have no effect. Otherwise -Og enables all -O1 optimization flags except for those that may interfere with debugging:

-fbranch-count-reg  -fdelayed-branch
-fdse  -fif-conversion  -fif-conversion2
-finline-functions-called-once
-fmove-loop-invariants  -fmove-loop-stores  -fssa-phiopt
-ftree-bit-ccp  -ftree-dse  -ftree-pta  -ftree-sra

Personally, I don't think I have ever run avr-gcc without -Os since size is by far my biggest concern most of the time.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4926
  • Country: nz
I was trying to poke bit 4 in PORTB to 0 then 1 at max possible speed.

Ok.

Quote
No optimization -O0 is intentional.

Very strange decision, especially if code size or speed is any factor at all. I would NEVER knowingly use less than gcc -O1 on any machine unless I was debugging the compiler itself (which I do sometimes).

Quote
Whatever C syntax I try for flipping the PB4 bit, the compiler generates a lot of extra assembly code at each C line.

That is a direct result of choosing to use -O0. Don't do that.

Equally, there is no need to go mad with -O3. Using -O1 is plenty in almost all cases and avoids generating stupid slow code that is slower than Javascript (on desktop PCs, obv).
 
The following users thanked this post: Psi, SiliconWizard

Online Psi

  • Super Contributor
  • ***
  • Posts: 10425
  • Country: nz
-O0 is horrible.
Greek letter 'Psi' (not Pounds per Square Inch)
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4549
  • Country: gb
-O0 is horrible.

GNU makes ... it horrible.
-O0 is much more better on CygnusC(m68k, C/89)
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4549
  • Country: gb
You may have to live with disappointment. The peak performance of avr-gcc code generation was somewhere around 3.4.6 – since then it has gotten dispiritingly bad.

That's the same for hc11.
Good between v2.95 and v3.4.6
Otherwise dispiritingly bad.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4549
  • Country: gb
GCC won't build on modern systems

umm, I built gcc-v3.3.* { hc11, m68k, m88k } on a 2020 GNU/Linux machine.
Lot of patches(3), and { C++, Ada, Java, ObjC } are dispiritingly broken and in such a traumatic way that I don't even try to fix them, for me they are all dead branches ...
... but C/89 - with the minimal setup(1) - can be resumed.

(1) I only compiled cc1, without support for any OS, I manually removed all the code, I didn't have time to fix it, nor did I care
The rest is broken.

GNU sucks a lot about the code-quality of Gcc!

Seriously, I'm quite a neat freak, but gcc is really a mass of helper scripts (including those for auto-doc, all broken(2)), and the C parts are written in an obscenely complex and brutal way, often without even respecting a minimum rules on how the code should be written.

I think they do it on purpose, as then you have to hire a GNU consultant, and pay him over 60 USD per hour.

(2) auto-doc are all miserably broken, and the absurd thing... you can't configure the gcc-build to exclude those parts, you have to patch automake and autoconfig to bypass them... really idiotic of a thing, and wastes a lot of time.

(3) Makefiles files also need to be patched, the language of GNU-Make has changed a bit, there are things that are no longer supported in old Makefile files, but it doesn't take long to fix them
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4926
  • Country: nz
GCC won't build on modern systems

umm, I built gcc-v3.3.* { hc11, m68k, m88k } on a 2020 GNU/Linux machine.
Lot of patches(3), and { C++, Ada, Java, ObjC } are dispiritingly broken and in such a traumatic way that I don't even try to fix them, for me they are all dead branches ...
... but C/89 - with the minimal setup(1) - can be resumed.

The official Ubuntu site has installation media going back to Ubuntu 4.10 (Warty Warthog) from 2004, specifically install CD images for x86, amd64, and PowerPC (Mac G3, G4, G5) and a live CD for 32 bit x86.

You should have little difficulty installing that into a VM/emulator and I'd expect to have no problems building gcc 2.95 or 3.x on it.

https://old-releases.ubuntu.com/releases/4.10/
 
The following users thanked this post: ledtester

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4549
  • Country: gb
little difficulty

Sure, VM and miniroot are a workaround  :D
In my case, I was talking about compiling everything from sources, with no binaries of any kind to support it, on a modern GNU/Linux.

edit:
I rembmer m88k had a piece of machine-layer that is completely broken, or rather, it uses headers that there is no way to make work without having to spend months on it, and I rewrote them using more modern headers as a reference, it's just a question of how manage the stack.

« Last Edit: July 20, 2024, 06:42:53 am by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4926
  • Country: nz
little difficulty

Sure, VM and miniroot are a workaround  :D

It's not a work-around, the only professional thing to do to ensure you can support and enhance or bugfix old software years or decades later is to archive the entire original environment it was built in, including not only compilers etc but also the OS.

Quote
In my case, I was talking about compiling everything from sources, with no binaries of any kind to support it, on a modern GNU/Linux.

Yes, but that is trying to shovel excrement uphill. You might with a lot of time and effort be able to make it work, but it's 1) very likely not worth it, and 2) may be simply too hard to make work at all, and 3) will not result in being able to generate the same artefacts again.
 

Offline RoGeorgeTopic starter

  • Super Contributor
  • ***
  • Posts: 7154
  • Country: ro
Quote
No optimization -O0 is intentional.

Very strange decision, especially if code size or speed is any factor at all. I would NEVER knowingly use less than gcc -O1 on any machine unless I was debugging the compiler itself (which I do sometimes).

Quote
Whatever C syntax I try for flipping the PB4 bit, the compiler generates a lot of extra assembly code at each C line.

That is a direct result of choosing to use -O0. Don't do that.

Equally, there is no need to go mad with -O3. Using -O1 is plenty in almost all cases and avoids generating stupid slow code that is slower than Javascript (on desktop PCs, obv).

The reason for using -O0 was to be sure that each C line has a 1:1 direct translation into some assembly code.  The hope was to make C behave almost like ASM code, and let me take control of the optimization, so to control a loop's timing by manually counting the clocks for each instruction, and manually arrange the instructions accordingly.  The -O0 was intended only for very small time-critical loop.

In other words, I was lazy, and trying to do from C what I should have done using assembly (lazy to search for how to properly combine C with assembly, while still retaining the ability of stepping through the bigger C program with a hardware debugger).  Bad idea indeed to use -O0.

Will write that critical loop in assembler, just to make sure the optimization flags of future compiler versions won't accidentally change the timing.

Side note, for this small test, the -O1 produces the smallest size:  196 bytes
Code: [Select]
ATtiny13_dw_AC_2024-07-15_v1.elf:     file format elf32-avr

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000066  00000000  00000000  00000074  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00000000  00800060  00800060  00000138  2**0
                  ALLOC, LOAD, DATA
  2 .comment      0000002f  00000000  00000000  00000138  2**0
                  CONTENTS, READONLY
  3 .stack.descriptors.hdr 0000000e  00000000  00000000  00000167  2**0
                  CONTENTS, READONLY
  4 .stack.descriptors 00000013  00000000  00000000  00000175  2**0
                  CONTENTS, READONLY
  5 .debug_aranges 00000038  00000000  00000000  00000188  2**0
                  CONTENTS, READONLY, DEBUGGING
  6 .debug_info   00000524  00000000  00000000  000001c0  2**0
                  CONTENTS, READONLY, DEBUGGING
  7 .debug_abbrev 0000033b  00000000  00000000  000006e4  2**0
                  CONTENTS, READONLY, DEBUGGING
  8 .debug_line   0000016a  00000000  00000000  00000a1f  2**0
                  CONTENTS, READONLY, DEBUGGING
  9 .debug_frame  0000006c  00000000  00000000  00000b8c  2**2
                  CONTENTS, READONLY, DEBUGGING
 10 .debug_str    000000f6  00000000  00000000  00000bf8  2**0
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_loc    0000005e  00000000  00000000  00000cee  2**0
                  CONTENTS, READONLY, DEBUGGING
 12 .debug_ranges 00000028  00000000  00000000  00000d4c  2**0
                  CONTENTS, READONLY, DEBUGGING
 13 .text         00000002  000000c2  000000c2  00000136  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 14 .note.gnu.avr.deviceinfo 0000003c  00000000  00000000  00000d74  2**2
                  CONTENTS, READONLY, DEBUGGING
 15 .text.WDT_off 00000010  000000b2  000000b2  00000126  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 16 .text.__vector_8 00000014  0000008a  0000008a  000000fe  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 17 .text.__vector_5 00000014  0000009e  0000009e  00000112  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 18 .text.main    00000024  00000066  00000066  000000da  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 19 .bss.counter  00000002  00800060  00800060  00000138  2**0
                  ALLOC

Disassembly of section .text:

00000000 <__vectors>:
   0: 0c c0        rjmp .+24      ; 0x1a <__ctors_end>
   2: 5f c0        rjmp .+190    ; 0xc2 <__bad_interrupt>
   4: 5e c0        rjmp .+188    ; 0xc2 <__bad_interrupt>
   6: 5d c0        rjmp .+186    ; 0xc2 <__bad_interrupt>
   8: 5c c0        rjmp .+184    ; 0xc2 <__bad_interrupt>
   a: 49 c0        rjmp .+146    ; 0x9e <__vector_5>
   c: 5a c0        rjmp .+180    ; 0xc2 <__bad_interrupt>
   e: 59 c0        rjmp .+178    ; 0xc2 <__bad_interrupt>
  10: 3c c0        rjmp .+120    ; 0x8a <__vector_8>
  12: 57 c0        rjmp .+174    ; 0xc2 <__bad_interrupt>

00000014 <.dinit>:
  14: 00 60        ori r16, 0x00 ; 0
  16: 00 62        ori r16, 0x20 ; 32
  18: 80 00        .word 0x0080 ; ????

0000001a <__ctors_end>:
  1a: 11 24        eor r1, r1
  1c: 1f be        out 0x3f, r1 ; 63
  1e: cf e9        ldi r28, 0x9F ; 159
  20: cd bf        out 0x3d, r28 ; 61

00000022 <__do_copy_data>:
  22: e4 e1        ldi r30, 0x14 ; 20
  24: f0 e0        ldi r31, 0x00 ; 0
  26: 40 e0        ldi r20, 0x00 ; 0
  28: 17 c0        rjmp .+46      ; 0x58 <__do_clear_bss+0x8>
  2a: b5 91        lpm r27, Z+
  2c: a5 91        lpm r26, Z+
  2e: 35 91        lpm r19, Z+
  30: 25 91        lpm r18, Z+
  32: 05 91        lpm r16, Z+
  34: 07 fd        sbrc r16, 7
  36: 0c c0        rjmp .+24      ; 0x50 <__do_clear_bss>
  38: 95 91        lpm r25, Z+
  3a: 85 91        lpm r24, Z+
  3c: ef 01        movw r28, r30
  3e: f9 2f        mov r31, r25
  40: e8 2f        mov r30, r24
  42: 05 90        lpm r0, Z+
  44: 0d 92        st X+, r0
  46: a2 17        cp r26, r18
  48: b3 07        cpc r27, r19
  4a: d9 f7        brne .-10      ; 0x42 <__DATA_REGION_LENGTH__+0x2>
  4c: fe 01        movw r30, r28
  4e: 04 c0        rjmp .+8      ; 0x58 <__do_clear_bss+0x8>

00000050 <__do_clear_bss>:
  50: 1d 92        st X+, r1
  52: a2 17        cp r26, r18
  54: b3 07        cpc r27, r19
  56: e1 f7        brne .-8      ; 0x50 <__do_clear_bss>
  58: e9 31        cpi r30, 0x19 ; 25
  5a: f4 07        cpc r31, r20
  5c: 31 f7        brne .-52      ; 0x2a <__do_copy_data+0x8>
  5e: 03 d0        rcall .+6      ; 0x66 <_etext>
  60: 00 c0        rjmp .+0      ; 0x62 <_exit>

00000062 <_exit>:
  62: f8 94        cli

00000064 <__stop_program>:
  64: ff cf        rjmp .-2      ; 0x64 <__stop_program>

Disassembly of section .text:

000000c2 <__bad_interrupt>:
  c2: 9e cf        rjmp .-196    ; 0x0 <__TEXT_REGION_ORIGIN__>

Disassembly of section .text.WDT_off:

000000b2 <WDT_off>:
void WDT_off(void)
{
// cli();
//__watchdog_reset();
/* Clear WDRF in MCUSR */
MCUSR &= ~(1<<WDRF);
  b2: 84 b7        in r24, 0x34 ; 52
  b4: 87 7f        andi r24, 0xF7 ; 247
  b6: 84 bf        out 0x34, r24 ; 52
/* Write logical one to WDCE and WDE */
/* Keep old prescaler setting to prevent unintentional time-out */
WDTCR |= (1<<WDCE) | (1<<WDE);
  b8: 81 b5        in r24, 0x21 ; 33
  ba: 88 61        ori r24, 0x18 ; 24
  bc: 81 bd        out 0x21, r24 ; 33
/* Turn off WDT */
WDTCR = 0x00;
  be: 11 bc        out 0x21, r1 ; 33
  c0: 08 95        ret

Disassembly of section .text.__vector_8:

0000008a <__vector_8>:
// __enable_interrupt();
}

ISR(WDT_vect) {
  8a: 1f 92        push r1
  8c: 0f 92        push r0
  8e: 0f b6        in r0, 0x3f ; 63
  90: 0f 92        push r0
  92: 11 24        eor r1, r1
//blink
}
  94: 0f 90        pop r0
  96: 0f be        out 0x3f, r0 ; 63
  98: 0f 90        pop r0
  9a: 1f 90        pop r1
  9c: 18 95        reti

Disassembly of section .text.__vector_5:

0000009e <__vector_5>:

ISR(ANA_COMP_vect, ISR_BLOCK) {
  9e: 1f 92        push r1
  a0: 0f 92        push r0
  a2: 0f b6        in r0, 0x3f ; 63
  a4: 0f 92        push r0
  a6: 11 24        eor r1, r1
{
PORTB ^= _BV(PB4); //toggle PB4
counter = 0;
}
*/
}
  a8: 0f 90        pop r0
  aa: 0f be        out 0x3f, r0 ; 63
  ac: 0f 90        pop r0
  ae: 1f 90        pop r1
  b0: 18 95        reti

Disassembly of section .text.main:

00000066 <main>:

int main(void)
{
cli();
  66: f8 94        cli
WDT_off();
  68: 24 d0        rcall .+72      ; 0xb2 <WDT_off>
DDRB |= _BV(PB4); //PB4 as Output
  6a: bc 9a        sbi 0x17, 4 ; 23
// 00 - interrupt on toggle
// 01 - reserved
// 10 - interrupt on falling edge
// 11 - interrupt on raising edge

ACSR |= _BV(ACIE); // enable Analog Comparator interrupts
  6c: 43 9a        sbi 0x08, 3 ; 8
// ? clear AC int flag (ACI)
// enable AC interrupts

// enable global interrupts
//sei();
cli();
  6e: f8 94        cli
// forever loop
for(;;)
{
//sleep_enable();

PORTB |= _BV(PB4); // set PB4 LED on
  70: c4 9a        sbi 0x18, 4 ; 24
PORTB &= ~_BV(PB4); // PB4 LED off
  72: c4 98        cbi 0x18, 4 ; 24

PORTB |= _BV(PB4); // set PB4 LED on
  74: c4 9a        sbi 0x18, 4 ; 24
PORTB &= ~_BV(PB4); // PB4 LED off
  76: c4 98        cbi 0x18, 4 ; 24

PORTB |= _BV(PB4); // set PB4 LED on
  78: c4 9a        sbi 0x18, 4 ; 24
PORTB &= ~_BV(PB4); // PB4 LED off
  7a: c4 98        cbi 0x18, 4 ; 24
//bit defines
#define WR     SBIT(PORTB,4)
#define WR_DDR SBIT(DDRB,4)
//usage
//WR_DDR = 1;
WR = 1;
  7c: c4 9a        sbi 0x18, 4 ; 24
WR = 0;
  7e: c4 98        cbi 0x18, 4 ; 24
//will result in sbi/cbi
WR = 1;
  80: c4 9a        sbi 0x18, 4 ; 24
WR = 0;
  82: c4 98        cbi 0x18, 4 ; 24

WR = 1;
  84: c4 9a        sbi 0x18, 4 ; 24
WR = 0;
  86: c4 98        cbi 0x18, 4 ; 24
  88: f3 cf        rjmp .-26      ; 0x70 <main+0xa>

-O0 will results in 484 bytes and a lot of stuffing ASM nonsense.  All the remaining -O options will produce the same binary of 208 bytes (so -Os produces more bytes than -O1 for this particular case :)).  Also, no matter the C syntax used for poking at PB4, for anything other than -O0 the compiler will always generate proper SBI/CBI assembly code for setting/resetting a single bit.

The learning from this is to forget about -O0, and never use that.  Thank you all for your help.



As for the AVR-GCC v3 (which might produce smaller code as many noticed or complained over the years), I've searched for it (the binaries, in the hope that I could still install it in a VM) for about half an hour, and couldn't find it anywhere.  Never though at that moment of looking in the repositories of old distros.  Did they still keep the repos after 10-20 years?  Didn't even try of building avr-gcc v3 from sources, so I just give up and keep whatever was already installed in my setup (v5.4.0).

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4926
  • Country: nz
The reason for using -O0 was to be sure that each C line has a 1:1 direct translation into some assembly code.  The hope was to make C behave almost like ASM code, and let me take control of the optimization, so to control a loop's timing by manually counting the clocks for each instruction, and manually arrange the instructions accordingly.

Ugh. This is an extremely bad idea. If you want asm then write asm.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4926
  • Country: nz
how to properly combine C with assembly

Simples. Up to 18 bytes of function arguments are passed in r8..r25, with the first argument going in r25 and working down.  Two byte arguments are stored with the MSB in the higher-numbered register. Function results are returned in r25, r24..r25 or r22..r25. Registers r18..r25 can be freely overwritten. Registers r17 and below must be preserved (PUSH at the start of the function, POP at the end) even if used to pass arguments.  Further arguments are stored on the stack, but you're not gonna need that, right?

Just like any other ISA with a GNU toolchain you write like:

Code: [Select]
        // int foo(int a, int b){return a+b;}
        .globl foo
foo:
        add r24,r22
        adc r25,r23
        ret

Pedantically you should add ".type   foo, @function" but I never do lol.

Bung that in a file ending with `.s` (or '.S` if you want to use C/C++ comments and the preprocessor in general for #define, #if etc) and add it to the gcc command line.  "avr-gcc -O myprog.c foo.S -o myprog".  Boom!
 
The following users thanked this post: RoGeorge, I wanted a rude username

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4549
  • Country: gb
It's not a work-around

These are points of view, for me, as much as I am forced to do for work-related things exactly what you are talking about, having to resort to frozen "ecosystems" ... is a clear sign that there is something wrong in the IT industry  :-//

I mean, I don't want to appear arrogant, but you are talking to a person who created his own C-like compiler, and I didn't write it so badly that I had to force my future self to have to freeze the rootfs in which it was compiled, and chroot into in order to recompile it again.

It was a project objective right from the start, which cost me more time, but was of vital importance, and I can compile myC on a 2005 stage4 (Gentoo Beta-1!!!), the one I still have on my old laptop, just as I can compile on a 2024 stage4, and there is no problem, since it doesn't even use any tricks that are "compiler-dependent" or "ecosystem-dependent".

With gcc...it's not possible. I understand that it is an immensely more complex and larger project than myC, but I also wonder... what reason should there be to persist in wanting to omit a flag that enables or disables the automatic generation of documentation? It was reported several times by several people from several GNU/Linux distro that is problematic as that part, talking about Gentoo, it causes compilation to fail, which in turn causes emerge to fail,  and people wasted months of time breaking ebuilds in a vague attempt to find temporary fixes, until the sad decision to make a patch to forcibly remove it!

... patch that has always been despised because it shouldn't remove any doc-gen (like man1, man2, ...)

Gcc is a great compiler, but it sucks as project for how it is written and organized, and it sucks not only on GNU/Linux, but it sucks even more on BeOS{x86, ppc} and zOS{x86}, where you necessarily have to use old versions v2.95 and v3.1. On Haiku{x86, arm, riscv}, modern BeOS, things are a little better, but the compatibility patches obscenely break between gcc v9 and later. And it's a recurring pattern that repeats itself. Over and over.

the only professional thing to do to ensure you can support and enhance or bugfix old software years or decades later is to archive the entire original environment it was built in, including not only compilers etc but also the OS.

Yup. Compiling the linux kernel has the same problem as compiling gcc: both are "highly ecosystem dependent", so what you described is exactly what Windriver does: they VMs or miniroots of the various compilers { Gcc, Diab, ...}, including the whole ecosystem.

For very old projects, like Tornado (decomissioned), they only provide a VM, simply because it is more convenient and quicker for those who have to take up very old projects again. We're talking about things before 2001, with all the tools (especially ICEs, profilers, etc) and configuration already ready.

Also, due to the terrible nature of Eclipse, another project that sucks, Windriver prefers to release VMs containing the VxWorks sources along with the toolchains { Gcc, Diab, ...} with their native ecosystem. All ready, all configured, whoever receives the DVDROM or USB stick doesn't have to waste a minute.

Usually, there is a table that shows, for each target, which VMs support it, organized by the semester of release.
E.g. 2004-1 covers everything up to July, while 2004-2 covers everything up to December.

Nice idea! Their archives range from 2010 to 2024, and from 1997 to 2009 for older stuff.

You choose the VM, start it, and you are faced with a graphical environment with the IDE and everything else.

-

However, on Gentoo you can't do that without breaking the underlying philosophy. And in the end, for what I needed, it took me only 3 months, with an excellent result: everything integrated, I can emerge and integrate it on any future stage4, at least until 2027

Quote
Yes, but that is trying to shovel excrement uphill. You might with a lot of time and effort be able to make it work, but it's 1) very likely not worth it, and 2) may be simply too hard to make work at all, and 3) will not result in being able to generate the same artefacts again.

Umm, it's as if I soaked the gcc sources in *bleach* for a long time. Bleach... cleans and disinfects but cooks the fabrics to the point that clean cuts are needed to save what can be saved.

In fact, how will I get to you in a bit... the cuts were deep and very little was saved ...

Code: [Select]
C3750 mybuilder-2023 # ./mybuild info all
BC success: m6811-elf, binutils v2.34, gcc v3.3.6-s12x, mini mode
BC success: m68k-elf, binutils v2.34, gcc v9.3.0, mini mode
BC success: m88k-coff, binutils v2.16.1, gcc v3.1.1, mini mode
BC success: mips-elf, binutils v2.24, gcc v9.3.0, mini mode
BC success: mipsel-elf, binutils v2.34, gcc v4.7.0, mini mode
BC success: mips64-elf, binutils v2.24, gcc v9.3.0, mini mode
BC success: powerpc-elf, binutils v2.24, gcc v9.3.0, mini mode
BC success: powerpc-eabi, binutils v2.24, gcc v9.3.0, mini mode
BC success: arm-eabi, binutils v2.24, gcc v9.3.0, mini mode
BC success: arm-thumb-elf, binutils v2.24, gcc v4.7.0, mini mode
BC success: sh2-elf, binutils v2.24, gcc v9.3.0, mini mode
BC success: sh4-elf, binutils v2.24, gcc v9.3.0, mini mode
BC success: riscv32-elf, binutils v2.39, gcc v9.3.0, mini mode
BC success: hppa-elf, binutils v2.24, gcc v9.3.0, mini mode
BC success: mcore-elf, binutils v2.24, gcc v9.3.0, mini mode
BC success: i960-coff, binutils v2.16.1, gcc v3.1.1, mini mode
BC success: msp430-elf, binutils v2.39, gcc v9.3.0, mini mode
(we are talking about these targets, and these specific versions ... and note ... m88k and i960 are not even elf)

It took me several years to fix the sources, which I not only cleaned, but also hacked in depth to better adapt them to my personal philosophy, e.g. I want gcc to be "position independent" with respect to where you install the binaries, and about 3 months to create the builder (Bash + Lua scripts + C utils) and test the compilers.

Wasn't it worth it? well... also consider that the goals here are
  • have all those compilers on HPPA and on MIPS, where you generally don't find any old versions
  • have all those compilers integrated into the current rootfs, no need for VM or chroot to miniroot
  • being able to compile them in much less time, about 1/20 of the time required by the vanilla versions
  • compile and install only the minimum that is really needed to compile c files

Goals achived, but you are absolutely right: there is a salt price to pay:o :o :o :o

Not only your time and effort, but ... as you can see the builder reports "gcc0", which, given how the gcc project is organized, means that it compiles a version of cc1 with a minimal collector.

In short, it won't work out of the box, and it will never work with anything except c/89.
Forget { Ada, Fortran, objC, ... } and other cores ...

And it also means that you then have to write your own "gcc" which passes "gcc0" (which in turns call cc1) the right parameters and also manages the preprocessor; so I wrote a collector, which works about the same as gcc's, but is much simpler.

This, in addition to brutally discontinuing all support for the libC library, which matters little to me, since I have my own.
« Last Edit: July 20, 2024, 12:01:02 pm by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 10289
  • Country: gb
-O0 is horrible.
-O0 may be horrible for tight code, but -O<anything else> is horrible for debugging.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4926
  • Country: nz
-O0 is horrible.
-O0 may be horrible for tight code, but -O<anything else> is horrible for debugging.

I completely fail to understand this point of view, but I've only been programming for 42 or so years so it's possible that I'm just thick. Horrible for debugging, how?

You're surely not single-stepping through your code instruction by instruction or line by line on CPUs that execute at minimum 20 or 30 million instructions per second (a $0.10 part), and 10+ billion instructions per second on a laptop/PC?

That didn't even make sense to me back when CPUs did 250k instructions per second.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 10289
  • Country: gb
-O0 is horrible.
-O0 may be horrible for tight code, but -O<anything else> is horrible for debugging.

I completely fail to understand this point of view, but I've only been programming for 42 or so years so it's possible that I'm just thick. Horrible for debugging, how?

You're surely not single-stepping through your code instruction by instruction or line by line on CPUs that execute at minimum 20 or 30 million instructions per second (a $0.10 part), and 10+ billion instructions per second on a laptop/PC?

That didn't even make sense to me back when CPUs did 250k instructions per second.
When you turn up the -O setting a lot of things only exist in registers. Currently debuggers are really bad at following these, so they can't tell you the value of various things when you stop the code to see what's up. With a good compiler, that keeps things really tight, debuggers can show you hardly anything unless you force them to. If you can't live with -O0 for speed reasons in an MCU you end up doing something like creating a few volatile variables, where you can store the values you want to inspect.
 

Offline RoGeorgeTopic starter

  • Super Contributor
  • ***
  • Posts: 7154
  • Country: ro
how to properly combine C with assembly

Simples. Up to 18 bytes of function arguments are passed in r8..r25, with the first argument going in r25 and working down.  Two byte arguments are stored with the MSB in the higher-numbered register. Function results are returned in r25, r24..r25 or r22..r25. Registers r18..r25 can be freely overwritten. Registers r17 and below must be preserved (PUSH at the start of the function, POP at the end) even if used to pass arguments.  Further arguments are stored on the stack
...

That spares me a lot of online searching, thank you.  Didn't try it yet, but that's the next thing to do.

All I wanted to do was to count the flips of the Analog Comparator output, but as fast as possible.  An interrupt triggered by the AC out, then inside the interrupt a counter is dividing by 16 or so.  Then the divided frequency flips a DO (for a speaker).  The idea was to connect the input of the AC to an electret microphone, and such to make an ultrasound probe, good to identify singing coils and such (the AC out divided by 16 will falls into audio range, and thus make the ultrasounds audible).

With C only, I had to use the "naked" attribute for the AC interrupt routine, to strip away the additional saving/restoring registers.  This trick made it fast enough to reliable divide up to ~200kHz audio input.

The step by step debugging might still make sense, no matter how many millions of instructions the MCU executes in a second.  For example, when the entire code is only ~10 lines and still not behaving as expected.

In my case the culprit was a clock prescaller.  The datasheet tells the prescaller is 1 after reset, though mine was 8.  :-//  So, all the clocks were 8 times slower than expected.  Will dig later to see why so.  (Neah, after reset the prescaller is 8, and specified so somewhere else in the full datasheet).  But once the prescaller is set to 1, it all starts to work fast enough.  Then, once I've stripped away the interrupts overhead, too, the program was working up to 200kHz analog input, which is probably more than what an electret microphone can sense.



When you turn up the -O setting a lot of things only exist in registers. Currently debuggers are really bad at following these, so they can't tell you the value of various things when you stop the code to see what's up.

I guess that is true for software (stab) debuggers, where some "spyware" code has to be compiled together with the project.  For hardware debuggers, there is no need for additional code, and no registers or flags are altered by the debugger.  The MCU has dedicated hardware to inspect or modify, aside for the registers used at normal execution.

The ATtiny13 I am using has support for debugWire, which means it has some additional registers to set a hardware breakpoint.  There is internal hardware that compares the PC with the breakpoint address, and halts the execution at a match.  Once the code execution is halted, the Reset pin can talk a serial protocol.  Everything inside the MCU can be read or written.  All the registers, memory addresses, flags, ports can be read or modified through the Reset pin only, using the debugWire protocol.  The MCU has additional hardware inside, in order to be able to do that.  Even the binary is written into the MCU's flash through the Reset pin alone, by debugWire.

There is a fuse bit DWEN (DebugWire ENable) that has to be asserted to put the MCU in DW mode.  This will disable the classic ISP mode (the mode that was using MOSI/MISO).

Hardware debugging can be very handy, and it's way better than printf debugging, or than stabbing software debuggers.  The funny thing is, the most popular AVR, the "Arduino UNO" is typically using an ATmega48 ATmega328 MCU, which is debugWIRE capable, yet the Arduino IDE v1.x never made us of it.  The new Arduino IDE 2.x has support for hardware debugging, but not for Arduino UNO (or nano), because the hardware in UNO/nano can not switch from ISP mode to debugWire.

It's a pity nobody took the extra mile to add hardware debugging for Arduino UNO and for nano.
« Last Edit: July 21, 2024, 05:31:51 pm by RoGeorge »
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4926
  • Country: nz
When you turn up the -O setting a lot of things only exist in registers.

Yes.

If you have a CPU with an adequate number of registers then all local variables (and arguments) in almost all functions will exist only in registers and that's a GOOD thing.

AVR, RISC-V, arm64, MIPS, SPARC and many others (including x86_64 with APX) have 32 general-purpose registers, minus a handful with fixed uses. AVR effectively has a few less than the others because you probably have some 16 (or even 32) bit variables that use multiple registers, but if there are byte variables too then you'll fit 16+ local variables in registers.

On these CPUs typically the only use of the stack is pushing a bunch of registers at function entry and popping them back at function exit. In between, all the code uses variables that are in registers, except only for explicit array/struct or globals/static access.

Things are tighter on CPUs with only 16 registers -- 32 bit Arm, 64 bit x86, MSP430, SuperH, M68k, VAX -- especially if you lose three or four registers to fixed uses, but you can still mostly fit most functions' arguments and locals into registers. (Sadly, compilers for M68k and VAX ... both dating from the late 1970s, before there were good register allocation algorithms ... tend to use stack-based arguments and locals despite having a good number of registers)

If you've got 8 or fewer registers then, yeah, things are pretty dire.

Quote
Currently debuggers are really bad at following these, so they can't tell you the value of various things when you stop the code to see what's up.

That shouldn't be the case, and I haven't seen that myself. If the variable is live then the debugger most certainly should be able to tell you its value, even if it is in different registers (or RAM) at different points. That't what those fancy debug metadata files are for.

The only exception is if you're trying to look at a stale value of a variable after its last use and the compiler reuses that register for a different variable. In that case the variable literally doesn't exist and the debugger should be telling you the variable is out of scope, or display a dash or something, not show you an incorrect value. But if you're stopped at (or before) an instruction that uses that variable then the debugger definitely should be showing you its value.

Complaining that a debugger doesn't show you the value of a variable after its last use is like complaining that the debugger can't show you the values variables had in the previous N loop iterations :-)  (If you want that, btw, then use nontrivial recursion not a loop)

I guess that's another advantage of printf debugging. The printf is a use of the variable and forces the compiler to preserve its value at least until that point.
 

Online brucehoult

  • Super Contributor
  • ***
  • Posts: 4926
  • Country: nz
The funny thing is, the most popular AVR, the "Arduino UNO" is typically using an ATmega48 MCU, which is debugWIRE capable

That would be very strange. I've never seen one of those Unos. People buying an Uno expect 2k SRAM and 32k flash in an ATMega328. If they end up with an ATMega48 with 512 bytes of RAM and 4k of flash then their sketches aren't going to fit! Heck, that's even less flash than the (bare) ATTiny85s I use for smaller projects.
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4549
  • Country: gb
Complaining that a debugger doesn't show you the value of a variable after its last use is like complaining that the debugger can't show you the values variables had in the previous N loop iterations :-)  (If you want that, btw, then use nontrivial recursion not a loop)

I guess that's another advantage of printf debugging. The printf is a use of the variable and forces the compiler to preserve its value at least until that point.

Otherwise you have to "instrument the code", isolate the variables you want to observe, in a way that forces the compiler to preserve their value at least until that points, and plan a script that stops the debugger at the right place to read the instrumented variables.

There are commercial softwares that do it automatically.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline DiTBho

  • Super Contributor
  • ***
  • Posts: 4549
  • Country: gb
Sadly, compilers for M68k and VAX ... both dating from the late 1970s, before there were good register allocation algorithms ... tend to use stack-based arguments and locals despite having a good number of registers

Going back to the early 90s, I'm sure that the old CygnusC/m68k (MSDOS v5 era) does exactly this; I remember that it tended to use the stack massively, moreover on SBCs that had asynchronous ram in ZIG ZAG dual in line package (4 bits per chip) much more slower than it is today. Back in the day... when the CPU clock was 8Mhz, and 12.5Mhz was considered "turbo mode".

Being "monad-based", I have the same problem with myC, because I haven't yet written a decent register allocation algorithm.
Even the unraveling of a math expression is done with an RPN approach, using only the stack.

I find it *very* difficult to follow these parts in books dealing with languages ​​and translators, at an advanced level.
You can have all the registers you want, but making good use of them is not at all simple, especially if you also have observability as your goal.

Blessed are those who understand how it works, and know how to do it well ...  :-//
« Last Edit: July 21, 2024, 05:05:39 am by DiTBho »
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf