Author Topic: GCC ARM32 compiler too clever, or not clever enough?  (Read 13608 times)

0 Members and 1 Guest are viewing this topic.

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
GCC ARM32 compiler too clever, or not clever enough?
« on: April 15, 2022, 08:54:45 pm »
This code

Code: [Select]
struct tm boot_time;
  char dtbuf [20];
getrtc (&boot_time);
snprintf( dtbuf,16,"%02d%02d%04d %02d%02d%02d", \
boot_time.tm_mday, boot_time.tm_mon+1, boot_time.tm_year+1900, \
boot_time.tm_hour, boot_time.tm_min, boot_time.tm_sec );
dtbuf[15]=0; // just in case

is generating a warning:

'%02d' directive writing between 2 and 11 bytes into a region of size between 0 and 14 [-Wformat-overflow=]

Lots of people have been up this path and clearly outputting an int (which in ARM32GCC is defined as 16 bits, believe it or not, not 32) %02d is the problem because it can generate 5 digits, even though the actual value cannot be bigger than say 31 for day of month.

So I changed it to limit the values explicitly

Code: [Select]
(boot_time.tm_mday)%32, (boot_time.tm_mon+1)%13, (boot_time.tm_year+1900)%2099, \
(boot_time.tm_hour)%24, (boot_time.tm_min)%60, (boot_time.tm_sec)%60 );

but the compiler is not realising that e.g. the MOD 32 is limiting the range to 2 digits.

What is the cleanest way to fix this (without a command line option to suppress the warning)?

The final string being generated is

15042022 204415

i.e. ddmmyyyy hhmmss

EDIT: I found that using %02u instead of %02d suppresses the warnings, but I can't see how. Using a much bigger buffer probably also works.
« Last Edit: April 15, 2022, 09:06:44 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #1 on: April 15, 2022, 09:24:41 pm »
Int in ARM GCC is 32 bits, you have a broken compiler if it is not the case for yours.

The warning comes from the format checker, it only knows types, it does not have any information from the optimizer on the possible range of the values, so it assumes the worst case scenario.
Alex
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #2 on: April 15, 2022, 09:33:33 pm »
How does %02u stop the warning? From an int, that could be 128, no? 3 digits.

Re the int, very good question. "int" is defined in this file

Code: [Select]
/*
 * Copyright (c) 2004, 2005 by
 * Ralf Corsepius, Ulm/Germany. All rights reserved.
 *
 * Permission to use, copy, modify, and distribute this software
 * is freely granted, provided that this notice is preserved.
 */

#ifndef _SYS__INTSUP_H
#define _SYS__INTSUP_H

#include <sys/features.h>

#if __GNUC_PREREQ (3, 2)
/* gcc > 3.2 implicitly defines the values we are interested */
#define __STDINT_EXP(x) __##x##__
#else
#define __STDINT_EXP(x) x
#include <limits.h>
#endif

/* Determine how intptr_t and intN_t fastN_t and leastN_t are defined by gcc
   for this target.  This is used to determine the correct printf() constant in
   inttypes.h and other  constants in stdint.h.
   So we end up with
   ?(signed|unsigned) char == 0
   ?(signed|unsigned) short == 1
   ?(signed|unsigned) int == 2
   ?(signed|unsigned) short int == 3
   ?(signed|unsigned) long == 4
   ?(signed|unsigned) long int == 6
   ?(signed|unsigned) long long == 8
   ?(signed|unsigned) long long int == 10
 */
#pragma push_macro("signed")
#pragma push_macro("unsigned")
#pragma push_macro("char")
#pragma push_macro("short")
#pragma push_macro("__int20")
#pragma push_macro("__int20__")
#pragma push_macro("int")
#pragma push_macro("long")
#undef signed
#undef unsigned
#undef char
#undef short
#undef int
#undef __int20
#undef __int20__
#undef long
#define signed +0
#define unsigned +0
#define char +0
#define short +1
#define __int20 +2
#define __int20__ +2
#define int +2
#define long +4
etc
etc

(last-1 line) which AFAIK came with Cube IDE from ST.

The reason this was not discovered for years was that we never use int except in trivial for() loops; always use uint8_t, uint16_t, uint32_t, int32_t, char, etc.
« Last Edit: April 15, 2022, 09:36:10 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #3 on: April 15, 2022, 09:46:55 pm »
I have not tried to calculate all the possible lengths variations, but unsigned values will not have a sign, so they are one byte shorter. Adding a bunch of values that are shorter by one byte, saves a lot of bytes.

No idea what that file is,  but int is always 4 bytes on ARM. It is trivial to check.
Alex
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #4 on: April 15, 2022, 10:09:34 pm »
OK so it is warning on the total snprintf output size versus the [20] buffer, or the 16 byte limit, not on the fact that %02d could itself output more than 2 digits?

« Last Edit: April 15, 2022, 10:16:10 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #5 on: April 15, 2022, 10:32:41 pm »
I've never encountered a C compiler for a 32-bit target that would have 16-bit 'int'. Unless maybe through some odd compiler flag.

That said, yes it's on the total size of the buffer, it just tried to estimate the max length based on the format, adding up the max for all parts of it. From my experience with GCC, it's not completely foolproof.

Note (probably obvious, but just thought I'd mention it) that %02d (or u) doesn't guarantee that the integer will be formatted with 2 chars only. It just fills the string with leading zeros if the resulting string would be shorter than 2 chars (excluding the sign, if I'm correct), but if it's longer, then it does nothing that I know of. So a typical 32-bit int with this format could yield up to 10 chars if positive, or 11 if negative. Correct me if I'm wrong.

Also, I'm not sure (or rather, I'm almost sure of the opposite) that using a modulo would change anything here as you noticed, as the result will be promoted to int anyway when being passed as an argument to snprintf, so I don't think the integrated static analyzer is smart enough to go deeper than this.

 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #6 on: April 15, 2022, 10:34:30 pm »
Yes, the warning is for the whole string. That's why it says that at some point you may need to write at least two bytes, but based on the previous stuff written into the buffer you will only have 0 to 14 bytes left. And having 0 bytes left is an issue here.

This is actually a very good feature of the compiler. It is one of those things that makes things safer for no run-time cost.
Alex
 

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #7 on: April 16, 2022, 12:21:58 am »
So I changed it to limit the values explicitly

Code: [Select]
(boot_time.tm_mday)%32, (boot_time.tm_mon+1)%13, (boot_time.tm_year+1900)%2099, \
(boot_time.tm_hour)%24, (boot_time.tm_min)%60, (boot_time.tm_sec)%60 );

but the compiler is not realising that e.g. the MOD 32 is limiting the range to 2 digits.

It doesn't limit it to 2 characters.

C and C++ integer division is not Euclidean division, it is round-to-zero division. This means that in signed context the remainder might end up being negative. E.g. `-45 / 32` produces `-1` and `-45 % 32` produces `-13`.

Fields in `struct tm` are declared as `int`, so in your case the compiler has to assume that the negative argument might produce 3 characters in the output, not 2. This is apparently why using format `u` instead of `d` suppresses the warning.
« Last Edit: April 16, 2022, 01:43:17 am by TheCalligrapher »
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #8 on: April 16, 2022, 12:47:23 am »
%02d guarantees a minimum of 2 characters, there is no limit on the maximum. If the value is bigger, then entire value would be printed. That's why the warning clearly says "writing between 2 and 11 bytes".
Alex
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #9 on: April 16, 2022, 01:29:18 am »
How does %02u stop the warning? From an int, that could be 128, no? 3 digits.

%02u can print 10 digits, %02d can print a - and then 10 digits.

Quote
Re the int, very good question. "int" is defined in this file

Code: [Select]
/*
/* Determine how intptr_t and intN_t fastN_t and leastN_t are defined by gcc
   for this target.  This is used to determine the correct printf() constant in
   inttypes.h and other  constants in stdint.h.
   So we end up with
   ?(signed|unsigned) char == 0
   ?(signed|unsigned) short == 1
   ?(signed|unsigned) int == 2
   ?(signed|unsigned) short int == 3
   ?(signed|unsigned) long == 4
   ?(signed|unsigned) long int == 6
   ?(signed|unsigned) long long == 8
   ?(signed|unsigned) long long int == 10
 */

:

#define int +2
#define long +4

(last-1 line) which AFAIK came with Cube IDE from ST.

I don't know where the heck that comes from but it's clear it's defining something that is essentially like an enum, probably to select something from a table/array at some point.

It is clearly *not* the size of data types in bytes.

"short int" being a bigger number than "int" (and different to "short") and "long" and "long int" not being the same gives that away.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #10 on: April 16, 2022, 06:40:19 am »
Thank you all. I am happy with %02u :)

The values in tm cannot be negative in this case.

Re the 16 bit "int", I don't know how to find out where this is defined, other than by swiping an instance of "int" and doing a right click to see the Declaration, and that takes me to that file of unknown origin. This project was set up by someone else ~3 years ago.

I tend to use uint32_t etc anyway.

Looking at say uint32_t, these are declared in _stdint.h which is this one

Code: [Select]
/*
 * Copyright (c) 2004, 2005 by
 * Ralf Corsepius, Ulm/Germany. All rights reserved.
 *
 * Permission to use, copy, modify, and distribute this software
 * is freely granted, provided that this notice is preserved.
 */

#ifndef _SYS__STDINT_H
#define _SYS__STDINT_H

#include <machine/_default_types.h>

#ifdef __cplusplus
extern "C" {
#endif

#ifdef ___int8_t_defined
#ifndef _INT8_T_DECLARED
typedef __int8_t int8_t ;
#define _INT8_T_DECLARED
#endif
#ifndef _UINT8_T_DECLARED
typedef __uint8_t uint8_t ;
#define _UINT8_T_DECLARED
#endif
#define __int8_t_defined 1
#endif /* ___int8_t_defined */

#ifdef ___int16_t_defined
#ifndef _INT16_T_DECLARED
typedef __int16_t int16_t ;
#define _INT16_T_DECLARED
#endif
#ifndef _UINT16_T_DECLARED
typedef __uint16_t uint16_t ;
#define _UINT16_T_DECLARED
#endif
#define __int16_t_defined 1
#endif /* ___int16_t_defined */

#ifdef ___int32_t_defined
#ifndef _INT32_T_DECLARED
typedef __int32_t int32_t ;
#define _INT32_T_DECLARED
#endif
#ifndef _UINT32_T_DECLARED
typedef __uint32_t uint32_t ;
#define _UINT32_T_DECLARED
#endif
#define __int32_t_defined 1
#endif /* ___int32_t_defined */

#ifdef ___int64_t_defined
#ifndef _INT64_T_DECLARED
typedef __int64_t int64_t ;
#define _INT64_T_DECLARED
#endif
#ifndef _UINT64_T_DECLARED
typedef __uint64_t uint64_t ;
#define _UINT64_T_DECLARED
#endif
#define __int64_t_defined 1
#endif /* ___int64_t_defined */

#ifndef _INTMAX_T_DECLARED
typedef __intmax_t intmax_t;
#define _INTMAX_T_DECLARED
#endif

#ifndef _UINTMAX_T_DECLARED
typedef __uintmax_t uintmax_t;
#define _UINTMAX_T_DECLARED
#endif

#ifndef _INTPTR_T_DECLARED
typedef __intptr_t intptr_t;
#define _INTPTR_T_DECLARED
#endif

#ifndef _UINTPTR_T_DECLARED
typedef __uintptr_t uintptr_t;
#define _UINTPTR_T_DECLARED
#endif

#ifdef __cplusplus
}
#endif

#endif /* _SYS__STDINT_H */


and the _default_types.h file (included in above) is

Code: [Select]
/*
 *  $Id$
 */

#ifndef _MACHINE__DEFAULT_TYPES_H
#define _MACHINE__DEFAULT_TYPES_H

#include <sys/features.h>

/*
 * Guess on types by examining *_MIN / *_MAX defines.
 */
#if __GNUC_PREREQ (3, 3)
/* GCC >= 3.3.0 has __<val>__ implicitly defined. */
#define __EXP(x) __##x##__
#else
/* Fall back to POSIX versions from <limits.h> */
#define __EXP(x) x
#include <limits.h>
#endif

/* Check if "long long" is 64bit wide */
/* Modern GCCs provide __LONG_LONG_MAX__, SUSv3 wants LLONG_MAX */
#if ( defined(__LONG_LONG_MAX__) && (__LONG_LONG_MAX__ > 0x7fffffff) ) \
  || ( defined(LLONG_MAX) && (LLONG_MAX > 0x7fffffff) )
#define __have_longlong64 1
#endif

/* Check if "long" is 64bit or 32bit wide */
#if __EXP(LONG_MAX) > 0x7fffffff
#define __have_long64 1
#elif __EXP(LONG_MAX) == 0x7fffffff && !defined(__SPU__)
#define __have_long32 1
#endif

#ifdef __cplusplus
extern "C" {
#endif

#ifdef __INT8_TYPE__
typedef __INT8_TYPE__ __int8_t;
#ifdef __UINT8_TYPE__
typedef __UINT8_TYPE__ __uint8_t;
#else
typedef unsigned __INT8_TYPE__ __uint8_t;
#endif
#define ___int8_t_defined 1
#elif __EXP(SCHAR_MAX) == 0x7f
typedef signed char __int8_t ;
typedef unsigned char __uint8_t ;
#define ___int8_t_defined 1
#endif

#ifdef __INT16_TYPE__
typedef __INT16_TYPE__ __int16_t;
#ifdef __UINT16_TYPE__
typedef __UINT16_TYPE__ __uint16_t;
#else
typedef unsigned __INT16_TYPE__ __uint16_t;
#endif
#define ___int16_t_defined 1
#elif __EXP(INT_MAX) == 0x7fff
typedef signed int __int16_t;
typedef unsigned int __uint16_t;
#define ___int16_t_defined 1
#elif __EXP(SHRT_MAX) == 0x7fff
typedef signed short __int16_t;
typedef unsigned short __uint16_t;
#define ___int16_t_defined 1
#elif __EXP(SCHAR_MAX) == 0x7fff
typedef signed char __int16_t;
typedef unsigned char __uint16_t;
#define ___int16_t_defined 1
#endif

#ifdef __INT32_TYPE__
typedef __INT32_TYPE__ __int32_t;
#ifdef __UINT32_TYPE__
typedef __UINT32_TYPE__ __uint32_t;
#else
typedef unsigned __INT32_TYPE__ __uint32_t;
#endif
#define ___int32_t_defined 1
#elif __EXP(INT_MAX) == 0x7fffffffL
typedef signed int __int32_t;
typedef unsigned int __uint32_t;
#define ___int32_t_defined 1
#elif __EXP(LONG_MAX) == 0x7fffffffL
typedef signed long __int32_t;
typedef unsigned long __uint32_t;
#define ___int32_t_defined 1
#elif __EXP(SHRT_MAX) == 0x7fffffffL
typedef signed short __int32_t;
typedef unsigned short __uint32_t;
#define ___int32_t_defined 1
#elif __EXP(SCHAR_MAX) == 0x7fffffffL
typedef signed char __int32_t;
typedef unsigned char __uint32_t;
#define ___int32_t_defined 1
#endif

#ifdef __INT64_TYPE__
typedef __INT64_TYPE__ __int64_t;
#ifdef __UINT64_TYPE__
typedef __UINT64_TYPE__ __uint64_t;
#else
typedef unsigned __INT64_TYPE__ __uint64_t;
#endif
#define ___int64_t_defined 1
#elif __EXP(LONG_MAX) > 0x7fffffff
typedef signed long __int64_t;
typedef unsigned long __uint64_t;
#define ___int64_t_defined 1

/* GCC has __LONG_LONG_MAX__ */
#elif  defined(__LONG_LONG_MAX__) && (__LONG_LONG_MAX__ > 0x7fffffff)
typedef signed long long __int64_t;
typedef unsigned long long __uint64_t;
#define ___int64_t_defined 1

/* POSIX mandates LLONG_MAX in <limits.h> */
#elif  defined(LLONG_MAX) && (LLONG_MAX > 0x7fffffff)
typedef signed long long __int64_t;
typedef unsigned long long __uint64_t;
#define ___int64_t_defined 1

#elif  __EXP(INT_MAX) > 0x7fffffff
typedef signed int __int64_t;
typedef unsigned int __uint64_t;
#define ___int64_t_defined 1
#endif

#ifdef __INT_LEAST8_TYPE__
typedef __INT_LEAST8_TYPE__ __int_least8_t;
#ifdef __UINT_LEAST8_TYPE__
typedef __UINT_LEAST8_TYPE__ __uint_least8_t;
#else
typedef unsigned __INT_LEAST8_TYPE__ __uint_least8_t;
#endif
#define ___int_least8_t_defined 1
#elif defined(___int8_t_defined)
typedef __int8_t __int_least8_t;
typedef __uint8_t __uint_least8_t;
#define ___int_least8_t_defined 1
#elif defined(___int16_t_defined)
typedef __int16_t __int_least8_t;
typedef __uint16_t __uint_least8_t;
#define ___int_least8_t_defined 1
#elif defined(___int32_t_defined)
typedef __int32_t __int_least8_t;
typedef __uint32_t __uint_least8_t;
#define ___int_least8_t_defined 1
#elif defined(___int64_t_defined)
typedef __int64_t __int_least8_t;
typedef __uint64_t __uint_least8_t;
#define ___int_least8_t_defined 1
#endif

#ifdef __INT_LEAST16_TYPE__
typedef __INT_LEAST16_TYPE__ __int_least16_t;
#ifdef __UINT_LEAST16_TYPE__
typedef __UINT_LEAST16_TYPE__ __uint_least16_t;
#else
typedef unsigned __INT_LEAST16_TYPE__ __uint_least16_t;
#endif
#define ___int_least16_t_defined 1
#elif defined(___int16_t_defined)
typedef __int16_t __int_least16_t;
typedef __uint16_t __uint_least16_t;
#define ___int_least16_t_defined 1
#elif defined(___int32_t_defined)
typedef __int32_t __int_least16_t;
typedef __uint32_t __uint_least16_t;
#define ___int_least16_t_defined 1
#elif defined(___int64_t_defined)
typedef __int64_t __int_least16_t;
typedef __uint64_t __uint_least16_t;
#define ___int_least16_t_defined 1
#endif

#ifdef __INT_LEAST32_TYPE__
typedef __INT_LEAST32_TYPE__ __int_least32_t;
#ifdef __UINT_LEAST32_TYPE__
typedef __UINT_LEAST32_TYPE__ __uint_least32_t;
#else
typedef unsigned __INT_LEAST32_TYPE__ __uint_least32_t;
#endif
#define ___int_least32_t_defined 1
#elif defined(___int32_t_defined)
typedef __int32_t __int_least32_t;
typedef __uint32_t __uint_least32_t;
#define ___int_least32_t_defined 1
#elif defined(___int64_t_defined)
typedef __int64_t __int_least32_t;
typedef __uint64_t __uint_least32_t;
#define ___int_least32_t_defined 1
#endif

#ifdef __INT_LEAST64_TYPE__
typedef __INT_LEAST64_TYPE__ __int_least64_t;
#ifdef __UINT_LEAST64_TYPE__
typedef __UINT_LEAST64_TYPE__ __uint_least64_t;
#else
typedef unsigned __INT_LEAST64_TYPE__ __uint_least64_t;
#endif
#define ___int_least64_t_defined 1
#elif defined(___int64_t_defined)
typedef __int64_t __int_least64_t;
typedef __uint64_t __uint_least64_t;
#define ___int_least64_t_defined 1
#endif

#if defined(__INTMAX_TYPE__)
typedef __INTMAX_TYPE__ __intmax_t;
#elif __have_longlong64
typedef signed long long __intmax_t;
#else
typedef signed long __intmax_t;
#endif

#if defined(__UINTMAX_TYPE__)
typedef __UINTMAX_TYPE__ __uintmax_t;
#elif __have_longlong64
typedef unsigned long long __uintmax_t;
#else
typedef unsigned long __uintmax_t;
#endif

#ifdef __INTPTR_TYPE__
typedef __INTPTR_TYPE__ __intptr_t;
#ifdef __UINTPTR_TYPE__
typedef __UINTPTR_TYPE__ __uintptr_t;
#else
typedef unsigned __INTPTR_TYPE__ __uintptr_t;
#endif
#elif defined(__PTRDIFF_TYPE__)
typedef __PTRDIFF_TYPE__ __intptr_t;
typedef unsigned __PTRDIFF_TYPE__ __uintptr_t;
#else
typedef long __intptr_t;
typedef unsigned long __uintptr_t;
#endif

#undef __EXP

#ifdef __cplusplus
}
#endif

#endif /* _MACHINE__DEFAULT_TYPES_H */


How should one define an int? to be an int32_t?
« Last Edit: April 16, 2022, 06:43:23 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #11 on: April 16, 2022, 06:49:39 am »
You can't define an "int" to be anything other than compiler needs it to be without breaking a bunch of stuff. It is a fundamental basic type.

But on ARM32 it is already equivalent to int32_t.
Alex
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #12 on: April 16, 2022, 09:30:36 am »
Well, what does

   volatile uint32_t fred=sizeof(int);

return for fred?

4 :)

Yet, in this POS called Cube IDE, it takes you somewhere else. A right-click on it here



and going here



takes you to _intsup.h and

Code: [Select]
#define int +2
which is some irrelevant BS.

Anyway, learnt something else today :)
« Last Edit: April 16, 2022, 09:36:42 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online nctnico

  • Super Contributor
  • ***
  • Posts: 26757
  • Country: nl
    • NCT Developments
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #13 on: April 16, 2022, 12:57:49 pm »
BTW: snprintf is guaranteed to have 0 at the end of the string (which is why I typically use snprintf instead of strcpy / strcat). There is no need to put a zero at the end of the string.
There are small lies, big lies and then there is what is on the screen of your oscilloscope.
 
The following users thanked this post: peter-h

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #14 on: April 16, 2022, 01:08:08 pm »
You mean if you specify a max length of 16 then snprintf is going to always drop a 0 into buf[15] regardless of whether the buffer overflowed? That's clever.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #15 on: April 16, 2022, 04:08:15 pm »
You mean if you specify a max length of 16 then snprintf is going to always drop a 0 into buf[15] regardless of whether the buffer overflowed? That's clever.
And it would always return the number of bytes it would have written if it had space. This way if you are using dynamic allocations and you don't know the final length of the string, you can make a reasonable guess, do a printf, check if it fit, if it did not - allocate more memory and try again.
Alex
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #16 on: April 16, 2022, 04:11:52 pm »
Yet, in this POS called Cube IDE, it takes you somewhere else. A right-click on it here
It is not POS, you are just asking it to do nonsense and it does if for you. All IDEs with this feature would do the same. Using an IDE does not free you from responsibility to think if things it shows you make sense.

It is not illegal to define "int" to be anything you want. But if you actually include that file in your code, everything will break. IDE does not know if this file is ultimately included in that place, so it just searches all the files for the definition.

You can type #define int "Cube Rulez" in some file, and it will get you there as well.
Alex
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #17 on: April 16, 2022, 05:10:46 pm »
How does %02u stop the warning? From an int, that could be 128, no? 3 digits.

%02u can print 10 digits, %02d can print a - and then 10 digits.

Quote
Re the int, very good question. "int" is defined in this file

Code: [Select]
/*
/* Determine how intptr_t and intN_t fastN_t and leastN_t are defined by gcc
   for this target.  This is used to determine the correct printf() constant in
   inttypes.h and other  constants in stdint.h.
   So we end up with
   ?(signed|unsigned) char == 0
   ?(signed|unsigned) short == 1
   ?(signed|unsigned) int == 2
   ?(signed|unsigned) short int == 3
   ?(signed|unsigned) long == 4
   ?(signed|unsigned) long int == 6
   ?(signed|unsigned) long long == 8
   ?(signed|unsigned) long long int == 10
 */

:

#define int +2
#define long +4

(last-1 line) which AFAIK came with Cube IDE from ST.

I don't know where the heck that comes from

Me neither. But redefining "int" as "+2"? What the f*cking heck! :-DD
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #18 on: April 16, 2022, 05:19:36 pm »
Me neither. But redefining "int" as "+2"? What the f*cking heck! :-DD
This is often done in places where you need to fill out some structures based on the type name. This file is not meant to be included in the normal program, it is meant to be included in the structure initialization or something like this. All of the things would be undefed for the rest of the flow.

This is an internal file not meant to be used by normal programs.
Alex
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #19 on: April 16, 2022, 05:27:02 pm »
Wherever you'd do that is freaking retarded. There is never a *good* reason for redefining a language's keyword. Never. It's just atrocious and shows you're doing something pretty wrong.
All the more if it's strictly some "internal" use - meaning restricted to a limited set of your own files - where you'd be free to do things "right".

Many people seem to hate C macros. As often discussed, I personally do not and find them pretty flexible and useful, but this one type of use: nope. Never. Please.

And here we even have a perfect illustration of even the OP not quite knowing what int is in their context and what is happening. :-DD
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #20 on: April 16, 2022, 05:35:22 pm »
Again, this is an internal GCC thing. You have to use stuff like this sometimes to make compilers work. You should not look at those files unless you are a compiler developer.
Alex
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #21 on: April 16, 2022, 05:36:47 pm »
Again, this is an internal GCC thing.

Is it? Can you point us to this?

Oh, and for the OP, the way to make sure of a given type's size would be to use sizeof and compile instead of trying to guess through incomprehensible header files.
« Last Edit: April 16, 2022, 05:39:08 pm by SiliconWizard »
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #22 on: April 16, 2022, 05:52:46 pm »
Is it? Can you point us to this?
The file name starts with "_", which generally denotes internal implementation files. The file specifically belongs to newlib, not even GCC itself.

But also, "#pragma push_macro" and "#pragma pop_macro" limit all the changes to the scope of this file.

It looks like this file figures out how size-independent format specifiers like PRId64() correspond to the actual standard format specifiers like "%lld". It is messy, but the whole C is messy on the low level.

The actual useful output of this file is defines like __FAST8, __LEAST16, etc. It does not leave any other stuff in the outside scope.
« Last Edit: April 16, 2022, 05:54:30 pm by ataradov »
Alex
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #23 on: April 16, 2022, 06:01:43 pm »
Is it? Can you point us to this?
The file name starts with "_", which generally denotes internal implementation files. The file specifically belongs to newlib, not even GCC itself.

Ah, that would already make more sense, as I've never seen such a thing in GCC.

But as I said, just use sizeof if in doubt, and call it a day. newlib's code is even messier than GCC's code.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #24 on: April 16, 2022, 08:38:48 pm »
Quote
This is an internal file not meant to be used by normal programs.

Well, yes, I already knew that the IDE often identifies relationships differently from the real build process. It works mostly ok for functions (it finds the .h file - unless there is more than one and then it offers you the whole list, which happens a lot with ST stuff where you have multiple CPUs #defined in .h files) and if you do it on the invocation of a function, it finds the actual function. And it correctly digs out the "source" for a uint32_t, but it breaks with a lot of other stuff. Another one is the References feature, which is supposed to find all code referencing that name; this often fails and I have to use a Search. And the Search itself sometimes fails and I have to use a Windows text search :)

Basically this IDE stuff is a tool which got put together by real men, for real men, and anybody who doesn't like it gets beaten up :)

I just don't have time to become an expert in all this. I've been doing hardware+asm for 40+ years. I have a rock solid working 32F417 project in Cube IDE, now v1.9.0, and Cube does what I need. Every so often it does something really weird. But much of this was done here
https://www.eevblog.com/forum/microcontrollers/is-st-cube-ide-a-piece-of-buggy-crap/msg4083625/#msg4083625
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1714
  • Country: se
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #25 on: April 16, 2022, 11:06:18 pm »
Basically this IDE stuff is a tool which got put together by real men, for real men, and anybody who doesn't like it gets beaten up :)
Which you summed up beautifully in "it's a piece of shit".

Neither VS Code, nor full Visual Studio do this. They'll tell they can't find a definition for 'int'.
Moreover, they understand compiler defines and #ifdef, if used correctly, and filter the result accordingly.

With Eclipse, it was always hit and miss.

Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #26 on: April 17, 2022, 12:59:05 am »
I just don't have time to become an expert in all this.

You're digging into corners that experts don't dig into. At least not microcontroller programming experts. Runtime library and compiler experts maybe, but that's a very different field, with (mostly) different experts.

C has always had the "sizeof" built in function. Just use it.
 

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #27 on: April 17, 2022, 01:23:04 am »
C has always had the "sizeof" built in function. Just use it.

`sizeof` is an operator, not a function.
 
The following users thanked this post: newbrain

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #28 on: April 17, 2022, 02:53:25 am »
C has always had the "sizeof" built in function. Just use it.

`sizeof` is an operator, not a function.

I win my bet.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #29 on: April 17, 2022, 08:11:31 am »
Now look at what I've done :) :) :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #30 on: April 17, 2022, 03:11:15 pm »
Why do you use MMDDYYYY hhmmss format? ISO 8601 defined YYYY-MM-DD hh:mm:ss  (although 8601-1:2019 made the T prefix for time mandatory for some reason, making it YYYY-MM-DDThh:mm:ss which I do not recommed).

The reason the YYYY-MM-DD hh:mm:ss is superior is that not only is it unambiguous, it is internationally known; but most importantly, it sorts correctly without any extra work.  (A plain ASCII/ISO Latin/Windows 457/1252/Unicode/UTF-8 sort will sort them correctly.)

Furthermore, if your date and time fields may have extra flag bits, use the & (binary and) operator to pick only the useful bits, and emit them as unsigned integers:
Code: [Select]
        // minimum size for dtbuf is 4+1+2+1+2+1+2+1+2+1+2 +1 = 20 chars
        snprintf(dtbuf, sizeof dtbuf, "%04u-%02u-%02u %02u:%02u:%02u",
            (boot_time.tm_year & 4095) + 1900,
            (boot_time.tm_mon & 15) + 1,
            (boot_time.tm_mday & 31),
            (boot_time.tm_hour & 31),
            (boot_time.tm_min & 63),
            (boot_time.tm_sec & 63));
(I don't recall the exact details, but I believe there may have been some RTC chips where one of the fields' most significant bit acted as a user-DST flag or something.  I do not think that is the case here, though.)

As pointed out above by nctnico, snprintf() always terminates the buffer with a nul char (\0 - that's what the final +1 is in the dtbuf size calculation).  If the string does not fit the given buffer, it will simply terminate it early, and return the actual number of chars it would have stored if there had been room (but excluding the final nul char, so you actually need room for one more).

The above code, assuming a sane snprintf() implementation, will never cause a buffer overrun; not even if you accidentally or on purpose change the size of the dtbuf array.  In your own version, there is a risk, if you reduce the size of the array to below 16, but forget to update the size in the snprintf() correspondingly.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #31 on: April 17, 2022, 06:30:49 pm »
You are absolutely right; I forgot that "standard" :)

Code: [Select]
struct tm boot_time;
  char dtbuf [20];
getrtc (&boot_time);
snprintf( dtbuf,sizeof(dtbuf),"%04u-%02u-%02u %02u:%02u:%02u", \
(boot_time.tm_year+1900)%2100, (boot_time.tm_mon+1)%13, (boot_time.tm_mday)%32, \
(boot_time.tm_hour)%24, (boot_time.tm_min)%60, (boot_time.tm_sec)%60 );

2022-04-17 18:34:32

If I did

Code: [Select]
snprintf( dtbuf,19,"%04u-%02u-%02u %02u:%02u:%02u"
the compiler did notice that.



The structure tm has no extra bits; it is just clean ints. The RTC registers get converted to tm with another chunk of code.

I am not implementing daylight savings; that (local time) is a complete nightmare in an embedded system. Even having GPS location is no good. Phones do it by obtaining the GSM tower local time (which obviously fails when near some borders) :) PCs and such implement it by periodically downloading the huge table of rules.
« Last Edit: April 17, 2022, 06:39:59 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #32 on: April 19, 2022, 07:14:33 pm »
Code: [Select]
snprintf( dtbuf,sizeof(dtbuf),"%04u-%02u-%02u %02u:%02u:%02u", \
(boot_time.tm_year+1900)%2100, (boot_time.tm_mon+1)%13, (boot_time.tm_mday)%32, \
(boot_time.tm_hour)%24, (boot_time.tm_min)%60, (boot_time.tm_sec)%60 );
Those particular modulo operators can be slow, and the result is still signed (assuming signed struct tm members as is typical), which is why I do recommend using binary and (&) and powers-of-two-less-one instead.

While it may not affect you right now, also note that due to leap seconds, the valid range of the seconds field is actually 0 to 60, inclusive.

Thus, I do recommend you change that into say
Code: [Select]
snprintf(dtbuf, sizeof dtbuf, "%04u-%02u-%02u %02u:%02u:%02u", \
(boot_time.tm_year+1900)&4095, (boot_time.tm_mon+1)&15, (boot_time.tm_mday)&31, \
(boot_time.tm_hour)&31, (boot_time.tm_min)&63, (boot_time.tm_sec)&63);
If you look at the compiled code, you'll see that these simplify to just one or two instructions per member.  Invalid values will output slightly different values then, but it will always stay in the expected format.

(There is a reason I do not put parentheses around the variable argument of sizeof.  The parentheses are required if the argument is a type, but not when it is a variable or an expression.  That reason is that because sizeof is an operator that only examines its argument without dereferencing any pointers or applying any side effects, expressions like sizeof (*(ptr++)) do not behave like a function call, say sizefunc(*(ptr++)), would: sizeof (*(ptr++)) is identical to sizeof *ptr and sizeof ptr[0].  Similarly, sizeof *(char *)0 is perfectly valid and evaluates to 1, and is not a null pointer dereference bug even though it looks like one.  To me, omitting the parentheses here is a style detail that helps me remember this quirk in its behaviour.  I like anything and everything that reduces my cognitive load without any side effects [except for some people having esthetic or stylistic objections to it: it is useful to me.)
 
The following users thanked this post: peter-h, newbrain

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #33 on: April 19, 2022, 07:33:29 pm »
Thank you again.

I will do this, but a bit of explanation for why I did the previous way:

This code runs only once, at product startup, to write the date/time into a file boot.txt. So not often. Anyway, any "printf" call is about a million clock cycles ;) as I have often found out to my cost in the Z80 etc days (could be literally 100ms with an IAR compiler) so anybody using it can't be expecting "speed".

The slow part of a MOD is a division which IIRC is 16 clocks on the arm32, which is nothing. A mult (int32 or float) is 1 clock.

The values in tm come from code I wrote myself, and it should not be possible to have duff values there to start with.

But you are right and I will do the AND ops because the objective here is to ensure that only 2 digits ever come out. The MOD does not ensure valid values because month lengths vary.

Quote
due to leap seconds, the valid range of the seconds field is actually 0 to 60, inclusive

That will break a lot of systems because loading 60 into an RTC chip will likely do something unpredictable. I reckon it will instantly roll over to 00.
« Last Edit: April 19, 2022, 08:27:45 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #34 on: April 20, 2022, 02:50:42 am »
While it may not affect you right now, also note that due to leap seconds, the valid range of the seconds field is actually 0 to 60, inclusive.

There have been 27 leap seconds added since 1972 (at the end of June 30 or December 31), but the last time one was needed was December 2016, with the ones before that June 2015 and June 2012.

Earth's rotation actually speeded up quite a bit in 2020 and if this continues then rather than needing another leap second any time soon there seems to be a real possibility that we might -- for the first time -- need a NEGATIVE leap second sometime in the next five to ten years. Or not.

If it happens then the time will go straight from 23:59:58 to 00:00:00 on some June 30 or December 31.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #35 on: April 20, 2022, 06:38:46 am »
That's a lesson for anyone writing bad code to run something at time x :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #36 on: April 20, 2022, 11:55:21 pm »
That reason is that because sizeof is an operator that only examines its argument without dereferencing any pointers or applying any side effects, expressions like sizeof (*(ptr++)) do not behave like a function call, say sizefunc(*(ptr++)), would: sizeof (*(ptr++)) is identical to sizeof *ptr and sizeof ptr[0].

While it is true that `sizeof` is an operator, the rest is not entirely accurate. Starting from C99 C language supports variably modified types. When `sizeof` is applied to an argument of variably modified type, the operand is evaluated using the usual rules.

Code: [Select]
#include <stdio.h>

int main(void)
{
  const int n = 1;
  int a[10][n], (*ptr)[n] = a;
 
  printf("%p\n", (void *) ptr);
  sizeof *ptr++;
  printf("%p\n", (void *) ptr);
}

Code: [Select]
0x7fff86d1e3d0
0x7fff86d1e3d4

http://coliru.stacked-crooked.com/a/4056da2ea66b8e8f
 

Offline ve7xen

  • Super Contributor
  • ***
  • Posts: 1192
  • Country: ca
    • VE7XEN Blog
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #37 on: April 21, 2022, 01:03:27 am »
73 de VE7XEN
He/Him
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #38 on: April 21, 2022, 07:34:09 am »
Some of those are indeed funny (and really stupid, like every Feb being 28 days) but others are likely to be incredibly difficult to code around, like not every month is wholly within the same year, which presumably is a reference to leap seconds, and this could play havoc with trying to start some task at (or more or less at) a given date/time - because that date+time may never actually happen. I think the bottom line is that one cannot use equality on seconds to trigger something. If checking frequently enough, one can use equality on minutes (and upwards) and then discard seconds or use a >= on the seconds. I wonder if the unix time functions (e.g. # of seconds between two date+time values) actually code this correctly.
« Last Edit: April 21, 2022, 07:35:45 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #39 on: April 21, 2022, 08:28:32 am »
Starting from C99 C language supports variably modified types. When `sizeof` is applied to an argument of variably modified type, the operand is evaluated using the usual rules.
Even more reason to consider sizeof "special", in the sense that one does not want to have any side effects in sizeof expressions, wouldn't you agree?

(One also needs to avoid side effects in assert() statements, but that is because the statement is not compiled at all when NDEBUG is defined.  That's the rough description, with exact details for example here.)

Now, not all of us need this kind of reminders, but my own code is more robust and reliable (lower bug density) when I do have them.  I do believe it may be useful tool for others as well, which is why I described it in my post; it is not an arbitrary stylistic choice for me.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #40 on: April 21, 2022, 05:11:05 pm »
Just don't use VLAs, and sizeof is purely a compile-time operator. Or is there any other kind of "variably modified type" in C that I missed?
 

Offline hamster_nz

  • Super Contributor
  • ***
  • Posts: 2803
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #41 on: April 22, 2022, 09:59:06 am »
If you just want to get your code running...

Code: [Select]
struct tm boot_time;
  char dtbuf [20];
getrtc (&boot_time);
dtbuf[0] = '0'+boot_time.tm_mday/10%10;
dtbuf[1] = '0'+boot_time.tm_mday%10;

        dtbuf[2] = '0'+(boot_time.tm_mon+1)/10%10;
        dtbuf[3] = '0'+(boot_time.tm_mon+1)%10;
                        ...
dtbuf[?]=0;
[code]
Gaze not into the abyss, lest you become recognized as an abyss domain expert, and they expect you keep gazing into the damn thing.
 
The following users thanked this post: peter-h

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #42 on: April 23, 2022, 02:39:04 am »
Just don't use VLAs, and sizeof is purely a compile-time operator. ?

A better advice would be: just don't write `sizeof` argument expressions with side effects. There's normally no credible reason to do so.

Or is there any other kind of "variably modified type" in C that I missed

No, there isn't.

However, people sometimes let VLA slip right under their noses. In C

Code: [Select]
const int N = 10;
int a[N];

is actually a VLA, even though it is a regular array in C++. This surprising difference between C and C++ sometimes pops up when porting code between languages or when writing cross-compilable code.
« Last Edit: April 23, 2022, 04:23:28 am by TheCalligrapher »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #43 on: April 23, 2022, 03:30:54 am »
Just don't use VLAs, and sizeof is purely a compile-time operator. ?

A better advice would be: just don't write `sizeof` argument expressions with side effects. There's normally no credible reason to do so.

Well I get the point, but it's a bit strict. For instance, something rather common is to use sizeof on a dereferenced pointer to get the size of the base type. Particularly useful in macros which can apply to any pointer. Example, say 'p' is a pointer to some base type: 'sizeof(*p)'.

'*p' is probably considered having a side-effect (even though in this context, the dereference will never be evaluated.)
 

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #44 on: April 23, 2022, 04:25:13 am »
Just don't use VLAs, and sizeof is purely a compile-time operator. ?

A better advice would be: just don't write `sizeof` argument expressions with side effects. There's normally no credible reason to do so.

Well I get the point, but it's a bit strict. For instance, something rather common is to use sizeof on a dereferenced pointer to get the size of the base type. Particularly useful in macros which can apply to any pointer. Example, say 'p' is a pointer to some base type: 'sizeof(*p)'.

'*p' is probably considered having a side-effect (even though in this context, the dereference will never be evaluated.)

Um... "Side effect" is a rather well-defined concept in C and C++ evaluation model. And no, `*p` does not have any side effects. (Unless, `p` is `volatile`, I'd guess...)
 
The following users thanked this post: newbrain

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #45 on: April 23, 2022, 04:45:07 pm »
Just don't use VLAs, and sizeof is purely a compile-time operator. ?

A better advice would be: just don't write `sizeof` argument expressions with side effects. There's normally no credible reason to do so.

Well I get the point, but it's a bit strict. For instance, something rather common is to use sizeof on a dereferenced pointer to get the size of the base type. Particularly useful in macros which can apply to any pointer. Example, say 'p' is a pointer to some base type: 'sizeof(*p)'.

'*p' is probably considered having a side-effect (even though in this context, the dereference will never be evaluated.)

Um... "Side effect" is a rather well-defined concept in C and C++ evaluation model. And no, `*p` does not have any side effects. (Unless, `p` is `volatile`, I'd guess...)

You guess. ::)
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1714
  • Country: se
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #46 on: April 24, 2022, 11:52:17 am »
You guess. ::)
No need to guess, ISO/IEC 9899:2011, 5.1.2.3 Program execution, §2:
Quote
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects[...]
So, *p, if evaluated, yields of course a side effect when p is a volatile pointer (as does the simple statement p;).
As an argument to sizeof, it does not, as it's not evaluated.

Really, there's only one exception to the 'sizeof argument is not evaluated' and it's VLAs, 6.5.3.4 The sizeof and _Alignof operators, §2:
Quote
[...]If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an
integer constant.

There is a reason I do not put parentheses around the variable argument of sizeof.  [...]  To me, omitting the parentheses here is a style detail that helps me remember this quirk in its behaviour.  I like anything and everything that reduces my cognitive load without any side effects
QFT.
In fact, I've always been doing exactly the same, for the very same reason.



Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #47 on: April 24, 2022, 05:24:30 pm »
Yes, my point. :)

As to the example:
Code: [Select]
const int N = 10;
int a[N];

it can be a trap for many indeed. Yes, here a is a VLA, while it may look stupid to many.
In particular, as a VLA, you can't provide an initilializer.

Code: [Select]
const int N = 10;
int a[N] = { 0 };

is wrong.

But as to sizeof (and stack manipulation), the reasonable optimizing C compilers I've tried all treat 'a' as a regular array when optimizations are enabled, and as a VLA when they aren't (which produces horrible code.)

That "trap" is nasty for those that think constants defined as const variables are "cleaner" than macros. Unfortunately, they are not always a replacement.
« Last Edit: April 24, 2022, 05:31:00 pm by SiliconWizard »
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1714
  • Country: se
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #48 on: April 24, 2022, 06:59:43 pm »
That "trap" is nasty for those that think constants defined as const variables are "cleaner" than macros. Unfortunately, they are not always a replacement.
Yes, C is full of traps for C++ programmers - and vice versa.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #49 on: April 27, 2022, 04:48:55 am »
Code: [Select]
int a[N] = { 0 };
That's just horrible, because whether the whole array or just the first element get zeroed is compiler dependent, no?

I seem to remember the zeroing works for structures.

That's even if N has been correctly evaluated at compile-time... I didn't think that worked at all anyway.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #50 on: April 27, 2022, 04:56:17 am »
C11 standard quote:
Quote
If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

So, this initialization would set the remaining elements to 0, even for the automatic variables.

Note that technically 0 is not required here, but the default initializer for static objects is 0 (section 6.7.9 of the same standard).
« Last Edit: April 27, 2022, 05:02:13 am by ataradov »
Alex
 

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8112
  • Country: fi
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #51 on: April 27, 2022, 04:51:35 pm »
That's just horrible, because whether the whole array or just the first element get zeroed is compiler dependent, no?
...
That's even if N has been correctly evaluated at compile-time... I didn't think that worked at all anyway.

May I suggest you do yourself a favor, and instead of assuming every time, just... look it up. Either directly from the standard, or if you for some reason don't like the language of standard, just google it.

Checking how it actually works usually takes 1-5 minutes; much much less than working under incorrect assumptions and misunderstanding other's code, or worse, writing incorrect code for years. (In this particular case, you wouldn't be writing incorrect code per se, because you would be just avoiding a perfectly valid construct. But OTOH, you could misunderstand other's code and waste time looking for a bug somewhere where it cannot be. Or, you could waste time writing explicit loop doing an initialization which would have worked with much less typing, and smaller risk of creating bugs.)

I have developed the habit of looking it up every time I'm even a bit unsure. The great thing about C is, it's quite well standardized, and those parts that are "implementation defined" or "undefined", are clearly said so.
« Last Edit: April 27, 2022, 04:53:20 pm by Siwastaja »
 
The following users thanked this post: newbrain, cfbsoftware, Jacon, SiliconWizard

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #52 on: April 27, 2022, 05:27:08 pm »
Code: [Select]
int a[N] = { 0 };
That's just horrible, because whether the whole array or just the first element get zeroed is compiler dependent, no?

Absolutely not, but others already replied. =)
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1132
  • Country: de
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #53 on: April 27, 2022, 07:56:26 pm »
Code: [Select]
int a[N] = { 0 };
That's just horrible, because whether the whole array or just the first element get zeroed is compiler dependent, no?

Absolutely not, but others already replied. =)

Of course it is sufficient if it calculates a result as if it were zeroed, e.g. https://godbolt.org/z/T5KboPcYr (the zero return value comes from the last zero-initialized byte in s[N+1]).
« Last Edit: April 27, 2022, 07:59:10 pm by gf »
 

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #54 on: April 28, 2022, 02:33:51 am »
Code: [Select]
int a[N] = { 0 };
That's just horrible, because whether the whole array or just the first element get zeroed is compiler dependent, no?

No.

1. All intialization in C follows the "all-or-nothing" principle. It is not possible to initialize only a part of an aggregate (array of structure) in C. When you supply an explicit initalizer for a just a little part of an aggregate, you can rest assured that the remaining parts of the aggregate will be initialized to zero.

2. In C `= { 0 }` is an idiomatic universal zero-initializer. It works with everything. It sets everything to zero. (Except one can't use with VLAs. Which is an oversight.)
« Last Edit: April 28, 2022, 02:37:05 am by TheCalligrapher »
 

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #55 on: April 28, 2022, 02:39:34 am »
Of course it is sufficient if it calculates a result as if it were zeroed, e.g. https://godbolt.org/z/T5KboPcYr

The example is actually illegal in both standard C and standard C++. C++ does not allow `char s[N+1]` for such an `N`. C does not allow `= { 0 }` for such an `s`.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #56 on: April 28, 2022, 04:10:41 am »
Of course it is sufficient if it calculates a result as if it were zeroed, e.g. https://godbolt.org/z/T5KboPcYr

The example is actually illegal in both standard C and standard C++. C++ does not allow `char s[N+1]` for such an `N`. C does not allow `= { 0 }` for such an `s`.

Clang diagnoses this correctly. Maybe gcc regards it as an extension.

It's even funnier if you make it N instead of N+1. Then gcc returns a random uninitialised byte from the stack -- which in the case of x86 is a byte from the return address.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #57 on: April 28, 2022, 04:42:03 am »
Since C17/C18 (both refer to exactly the same C standard revision), a standards-compliant C compiler does not need to support variable-length arrays anyway.
(Those that do not support them, should pre-define the __STDC_NO_VLA__ preprocessor macro.)

I don't mind.  In practice, VLAs tend to have OS-dependent practical limits (specifically, stack size or stack size growth mechanism) that make them not very useful anyway.

It is, however, important to differentiate between variable-length arrays/variably modified types, and flexible array members.  Sometimes new C programmers do confuse the two (variably modified types and flexible array members), by trying to declare a variable of a type containing a flexible array member, which is not allowed.  (You can only have pointers to such types.)

Flexible array members are supported in all C standard revisions since C99, but not in any C++ standard revisions.  For example,
Code: [Select]
struct bst_node {
    struct bst_node *le;
    struct bst_node *gt;
    char data[];  /* Flexible array member */
};

struct bst_node *bst_node_new_mem(const char *src, size_t len)
{
    struct bst_node *new_node;

    new_node = malloc(sizeof (struct bst_node) + len + 1);
    if (!new_node)
        return NULL;

    new_node->le = NULL;
    new_node->gt = NULL;
    if (len > 0) {
        memcpy(new_node->data, src, len);
    }
    new_node->data[len] = '\0';

    return new_node;
}

struct bst_node *bst_node_new(const char *src)
{
    return bst_node_new_mem(src, (src) ? strlen(src) : 0);
}
defines a bst_node type with a flexible array member data, so that instances of that type have no limitations on the data string length other than what the implementation (hardware architecture, operating system, etc.) sets anyway.  The bst_node_new() and bst_node_new_mem() show how to properly create variables of such a type.
« Last Edit: April 28, 2022, 04:43:35 am by Nominal Animal »
 
The following users thanked this post: newbrain, thinkfat

Offline gf

  • Super Contributor
  • ***
  • Posts: 1132
  • Country: de
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #58 on: April 28, 2022, 08:41:04 am »
Of course it is sufficient if it calculates a result as if it were zeroed, e.g. https://godbolt.org/z/T5KboPcYr

The example is actually illegal in both standard C and standard C++. C++ does not allow `char s[N+1]` for such an `N`. C does not allow `= { 0 }` for such an `s`.

Yes, you are right, I was thoughtless, it's just a GCC extension. Still s[N] is zero - from the initialization - and the function returns zero. Without initialization, it returns a random value from the stack, which is OK as well, when accessing an uninitialized variable.
I just wanted to demonstrate that the compiler does not need to generate any code which actually allocates and initializes the array variable. It just has to calculate the same result as if it were allocated and initialized (and it must preserve all visible side effects, like volatile access, for instance).

It's even funnier if you make it N instead of N+1. Then gcc returns a random uninitialised byte from the stack -- which in the case of x86 is a byte from the return address.

That's UB and therefore OK, since "return s[N]" references a value beyond the end of the array, if sizeof(s) == N.
« Last Edit: April 28, 2022, 09:43:51 am by gf »
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #59 on: April 28, 2022, 05:06:46 pm »
Of course it is sufficient if it calculates a result as if it were zeroed, e.g. https://godbolt.org/z/T5KboPcYr

The example is actually illegal in both standard C and standard C++. C++ does not allow `char s[N+1]` for such an `N`. C does not allow `= { 0 }` for such an `s`.

I can't tell for C++ (don't know it enough), but I'll trust you on that.

For C, as I said before, it certainly is illegal.
Using GCC with the same code (just changed the include for C), GCC gives not just a warning, but an error, as it should:
Code: [Select]
#include <string.h>

static int f(const char *p, size_t N)
{
    char s[N+1] = {0};
    memcpy(s, p, N);
    return s[N];
}

int g()
{
    static const char s[] = "abcd";
    return f(s, sizeof(s));
}

Quote
VLA1.c: In function 'f':
VLA1.c:5:5: error: variable-sized object may not be initialized
    5 |     char s[N+1] = {0};
      |     ^~~~
 

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #60 on: April 28, 2022, 07:09:23 pm »
Clang diagnoses this correctly. Maybe gcc regards it as an extension.

Neither Clang nor GCC is even trying to diagnose anything correctly until one supplies `-std=...` and `-pedantic` in the command line. Basically, C or C++ begin with these command-line switches. Without them the aforementioned implementations have only superficial connection to C or C++.
 
The following users thanked this post: newbrain

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #61 on: April 28, 2022, 08:27:09 pm »
I don't mind.  In practice, VLAs tend to have OS-dependent practical limits (specifically, stack size or stack size growth mechanism) that make them not very useful anyway.

^^^ The same strange fallacy based on complete lack of understanding what VLA is and what its purpose is. VLA have absolutely no issues with "practical limits" and no connection to "stack size or stack size growth mechanism" whatsoever.

VLA is and has always been nothing more than a piece of `size_t` sidecar data accompanying each variable modified type. A single `size_t` per type is too negligible to bump into any practical limits.

VLA is an absolutely necessary feature of the language since it covers a very critical omission in C array support model. This

Code: [Select]
void foo(size_t n, size_t m, int a[n][m])
{
  /* access A[i][j] using natural syntax */
}

Without VLA we are forced to employ low-grade cludges or ugly workarounds or to hack up a replacement.
« Last Edit: April 28, 2022, 08:34:00 pm by TheCalligrapher »
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1132
  • Country: de
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #62 on: April 28, 2022, 09:30:25 pm »
Yet another use case are still local variables having a size which is not known at compile time. Using malloc() + free() instead were even more expensive.
Even the old Unixes from the 1970-ies or 1980-ies already did "invent" alloca(), for allocating blocks of variable size on the stack, with function-local lifetime. Obviously there was a need for it (although there was of course no direct language support yet in K&R C).
 

Offline TheCalligrapher

  • Regular Contributor
  • *
  • Posts: 151
  • Country: us
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #63 on: April 28, 2022, 09:58:10 pm »
Yet another use case are still local variables having a size which is not known at compile time.

Full example:

Code: [Select]
#include <stdio.h>

void foo(size_t n, size_t m, int a[n][m])
{
  for (size_t i = 0; i < n; ++i)
  {
    for (size_t j = 0; j < m; ++j)
      printf("%d ", a[i][j]);
    printf("\n");
  }
}

int main()
{
  int a[3][3] = { { 1, 2, 3 }, { 4, 5, 6 }, { 7, 8, 9 } };
  foo(3, 3, a);
 
  int b[2][5] = { { 0, 1, 2, 3, 4 }, { 5, 6, 7, 8, 9 } };
  foo(2, 5, b);
}

http://coliru.stacked-crooked.com/a/186e848798907c99

This example is critically dependent on VLA functionality, yet it does not have any "local variables having a size which is not known at compile time".

This is what VLA is for. First and foremost it is an extension of type system, not a feature for creating run-time-sized arrays on the stack.
« Last Edit: April 28, 2022, 11:13:08 pm by TheCalligrapher »
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #64 on: April 29, 2022, 02:36:14 am »
I don't mind.  In practice, VLAs tend to have OS-dependent practical limits (specifically, stack size or stack size growth mechanism) that make them not very useful anyway.

^^^ The same strange fallacy based on complete lack of understanding what VLA is and what its purpose is. VLA have absolutely no issues with "practical limits" and no connection to "stack size or stack size growth mechanism" whatsoever.
Fallacy my ass.  Even the C standard itself makes a distinction between "variable length array" and "variably modified types", even though they are controlled by the exact same mechanism, and closely related.  You took a shortcut, and claim the two are the exact same thing.

When you declare an array size in a function prototype, for example
    double  determinant(size_t rows, size_t cols, double matrix[rows][cols]);
the third parameter has a variably modified type.  While you claim this is essential, it is extremely rarely used in existing C or POSIX C code, because explicitly specifying the data stride is just much more versatile:
    #define  DET(m)  (determinant(sizeof (m) / sizeof *(m), sizeof *(m) / sizeof **(m), sizeof *(m) / sizeof **(m), 1, (const double *)(m)))
    double  determinant(size_t rows, size_t cols, ssize_t rowstride, ssize_t colstride, double *origin);
since the latter supports both row-major and column-major data orders (and trivial transpose), submatrices, and so on.  The DET() preprocessor macro calls the function with the correct parameters, when a two-dimensional double array variable is specified; this avoids problems with the array dimensions (a typical problem, when the programmer changes the array dimensions in one place, but forgets to update them everywhere).

For what it is worth, I have no problems with either of the above, but the former does fall under the same optional support depending on __STDC_NO_VLA__.  For linear algebra, I have a very nice library approach that hides the complexity in two structures – one describing the matrices, and the other containing the matrix data in refcounted form – with views and submatrices indistinguishable from primary matrices; the outline of which I've described in e.g. here.  I am also very familiar with BLAS and LAPACK (and the Intel and AMD math libraries that implement them) and GSL, and have contributed to dozens of projects from the GNU C library to the Linux kernel, and have examined the sources of hundreds if not thousands of open source C projects, so I do claim I know pretty well what kind of code is actually used in practice, and why.  Theory is one thing; practical matters are much more important in real life.

What has become much more common in existing C or POSIX C code, is declaring variables of variably modified types, "local variable length arrays":
    void foo(size_t rows, size_t cols) {
        double  cache[rows][cols];
exactly because people think "malloc() is expensive and prone to bugs".  This is the problematic case, and is not supported by the C compiler if it defines __STDC_NO_VLA__.

For multi-threaded POSIX C programs, the stack often has a relatively small fixed size limit; which means that single-threaded code (due to the automatically growing stack) has completely different stack size limit than multithreaded code, making silly developers think "oh, multithreaded code is just too hard and unpredictable".

The stack access scheme for these has the same issue as large local variable arrays on stack on many OSes, including Linux.  When the stack has no defined upper size limit, in virtual memory systems it is implemented using guard pages.  (When there is a strict upper limit, the entire stack area is reserved in virtual memory, just not populated with RAM yet.  Because virtual memory is not "free", this scheme actually uses much more total RAM than the guard page approach.)  Essentially, after the stack, you have an area of memory that is allocated but inaccessible.  The first access to this area causes the OS kernel to populate the guard pages (backed by RAM, so that they can be made accessible), and set up new guard pages just after.  The guard page mechanism avoids the need to set up page tables for example for a gigabyte of stack; if the new guard pages cannot be set up, the kernel simply notifies the process about a segmentation violation, which typically causes the process to abort.
The problem appears when a function has more than the guard pages' worth of local variables.  If the existing stack is also nearly full at the time the function is called, the first access to stack (local variables) can be beyong the guard pages, and lead to immediate segmentation violation.
I have seen this in practical code, when buffer sizes are increased, and the programmer or the code assumes that local variables do not have any practical size limits – but they do.

I do not care what the C standard says about this, because this is practical, and as a system admin, I needed this – be able to limit stack and heap sizes separately – to catch leaky code early enough.  (Consider simulations running for days, in a distributed HPC cluster.)
For me, the C standards since C99 are like Nobel Peace Prize: a fantasy of what the world is, and an attempt to manipulate the world toward that view.  It is useful, but not always practical; and we live in the practical world, not in that fantasyland.
« Last Edit: April 29, 2022, 02:38:40 am by Nominal Animal »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #65 on: May 01, 2022, 06:14:48 pm »
Gosh I've been away travelling and look at the trouble I have started again :)

Are you experts telling me that

uint8_t fred[1000] = {0x22};

will set all 1000 bytes to 0x22, and this is not compiler dependent?

I wonder how this is implemented. In a small case

uint_t fred[] = {"this is a few bytes"};

the string, with an 0x00 at the end, is stored in the initialised data section (DATA), and there is a loop in startupxxx.s which copies it over (FLASH -> RAM).

But a case like [1000] above, that would be dumb, and the compiler should need to use a loop. But maybe it doesn't.
« Last Edit: May 01, 2022, 06:17:11 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #66 on: May 01, 2022, 06:33:31 pm »
will set all 1000 bytes to 0x22, and this is not compiler dependent?
No, it will set the first byte to 0x22 and the rest to 0

How it is done depends on the type of the variable. Global variables (static storage) will just go into a data section as one big array, the same way as if you explicitly specified all those extra 0s.

For an automatic variable there may be different approaches the compiler can take, including memset() to 0 and initializing just the first byte. Same will happen with your string example for a local variable - it will not just go into the data section. The variable will be located on the stack and will be initialized (memcpy) at the time of declaration (provided there are no possible optimizations).

And again, compiler explorer is your friend. Here https://godbolt.org/z/rTjrsTjxr you can see that Clang does call a memset() followed by an initialization of the first member.
« Last Edit: May 01, 2022, 06:45:11 pm by ataradov »
Alex
 
The following users thanked this post: peter-h

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #67 on: May 01, 2022, 07:32:27 pm »
If the 1000 byte array is static and uninitialised then it is in COMMON which is not set to anything at startup.
If the 1000 byte array is static and initialised then it is in DATA which at startup is initialised from values in FLASH.
If any variable is initialised to 0 then it goes into BSS which is zeroed at startup.

I've never tried the case of a 1000 byte array set with { 0 } - I guess it would go into BSS, but would it, given the { 0 } sets only the first element?

The above is what I have found with ST ARM32 GCC.
« Last Edit: May 01, 2022, 07:42:31 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #68 on: May 01, 2022, 09:32:59 pm »
There is no COMMON. There is only data and BSS. All static variables that are initialized go into the data segment, everything else goes into BSS.

I already  quoted a part of the C standard that says that all static variables that are not explicitly initialized would be set to 0 (or NULL). It is impossible to have a static variable that has an undefined value.
Alex
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #69 on: May 01, 2022, 09:40:05 pm »
As far as C is concerned, global data is either initialized with defined values if provided, or filled with zeros otherwise.

But what peter is seeing is implementation-specific stuff.

GCC puts data in a pretty large number of different sections actually. 'common' is one of them, but there are literally tens of them.

That's the linker script job to aggregate those sections into fewer sections. In my linker scripts (as with almost all default scripts I've seen), the 'common' subsection (and a number of others) is put within the .bss section, so that becomes zeroed out at startup. But specific linker scripts can do otherwise for a number of reasons. Note that in this case, the result is not compliant with the standard anymore.

Not placing the 'common' section emitted by GCC into the BSS section in the linker script allows to leave non-explicitely initialized global variables untouched, which can save some time at startup.
Again, it will give a non-conforming behavior though.
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1714
  • Country: se
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #70 on: May 01, 2022, 10:29:36 pm »
uint8_t fred[1000] = {0x22};

will set all 1000 bytes to 0x22, and this is not compiler dependent?
If the 1000 byte array is static and uninitialised then it is in COMMON which is not set to anything at startup.

peter-h, this is not the first time I say it, but at the risk of becoming unpleasant, I'll repeat it one more time:
Please, read and study the language.
Find a good book, the standard, an online course or a tutor* and clean up all the misconceptions you seem to have.
(these are not the first, you often seem surprised by not so obscure features of the language).
Then, how  a compiler implements something mostly becomes a moot point (but I love Godbolt too), unless you're going for the utmost optimization or you find a bug (rare, but it happened to me recently).

I'm sorry if I seem patronizing or insulting, please understand this is absolutely not my intention.


*to each their own preferred method of learning.
After a bad experience with a Shildt book, I relearned C directly from the C99 standard - but I understand it's not everyone's cup of tea.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #71 on: May 02, 2022, 06:42:37 am »
I don't get offended if someone just tells me I am wrong :) But I am not clever enough / have enough time, to read 1000 page compiler standards. Especially when the behaviour is so system dependent.

I looked at the setup of my project, which is based on the ST Cube IDE setup and code examples.

The linker script does indeed lump together BSS and COMMON (which may be something I did) and this is all zeroed at startup (in the asm code):

Code: [Select]
/* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss section */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)
    *(.common)
    *(.common*)
    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

Code: [Select]
  ldr  r2, =_sbss
  b  LoopFillZerobss
/* Zero fill the bss segment. */ 
FillZerobss:
  movs  r3, #0
  str  r3, [r2], #4
LoopFillZerobss:
  ldr  r3, = _ebss
  cmp  r2, r3
  bcc  FillZerobss

(yes I know the above asm is strange but that's how ST did it for their 32F407/417).

Now, the Q is what ends up in BSS and what ends up in COMMON. So I did these statics:

Code: [Select]
int fred111;
int fred222=0;
int fred333=1;

Looking at the .map file, I see

fred111 is in BSS
fred222 is in BSS
fred333 is in DATA (which is in FLASH, and gets copied to RAM at init)

The above is probably correct as per what people above are saying, but is different to what I determined a year or two ago when I was doing this testing, but I don't know what has changed. Nothing in my project, but with Cube 1.9 GCC has gone from v9 to v10. The previous behaviour was documented by me as follows, based on exactly these tests:

With ARM GCC, statics are categorised thus:
int fred; goes into COMMON
int fred=0; goes into BSS (which by definition is zeroed by startup code)
int fred=1; goes into DATA (statics initialised to nonzero)


So something has changed from GCC v9 to v10 in that v10 doesn't seem to be generating anything for COMMON. Or it could be Cube 1.9 running GCC with a different script; I never looked at that stuff.
« Last Edit: May 02, 2022, 06:48:49 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #72 on: May 02, 2022, 06:55:19 am »
This behaviour has not changed in a log time, probably ever.

The reason for common to exist is some real legacy stuff. It allows multiple compilation units to declare an extern variable, and I think there is a reason why this is necessary in fortran.

For C there is a flag "-fno-common" to just place things into bss. And I think C++ always acts as if this flag is specified because having those commons may break name mangling or something like this.

And yes, for compatible behaviour you have to place all common symbols into BSS. Don't use this for uninitialized variables, it might break things badly, possibly producing duplicate allocations for the same variable.
« Last Edit: May 02, 2022, 06:57:38 am by ataradov »
Alex
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #73 on: May 02, 2022, 07:51:42 am »

Quote
Don't use this for uninitialized variables,

Why not? You don't care what the value is anyway :)

Uninitialised statics do end up in BSS currently, so

int fred1;
int fred2=0;

are equivalent.

That said, the vast majority of C code and variable declarations are inside functions, where initialisation is different and fred1 will probably be genuinely undefined. I think statics are used mostly for globals which most people will initialise explicitly at declaration. In the old Z80 etc days, statics produced much faster code than stack based variables but I don't think that's true anymore.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #74 on: May 02, 2022, 08:10:12 am »
Why not? You don't care what the value is anyway :)
If you don't collect common objects, you risk running into strange situations with memory allocations. You can still have them uninitialized, just collect them into a separate section.

Also, it looks like at some point they enabled "-fno-common" by default, so commons are no longer issued by recent compilers unless you force them. And this is a good thing.

EDIT: According to their bugzilla, it is the default since GCC 10. So you can just forget that commons existed.
« Last Edit: May 02, 2022, 08:11:57 am by ataradov »
Alex
 
The following users thanked this post: peter-h

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #75 on: May 02, 2022, 08:30:09 am »
That explains it then... v10 did this. I see no COMMON generated now.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline gf

  • Super Contributor
  • ***
  • Posts: 1132
  • Country: de
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #76 on: May 02, 2022, 12:53:54 pm »
If the compiler generates common blocks, this allows to define the same global variable e.g. int a[100] in multiple C files, and the linker shares this variable between the corresponding object files, and does not complain about multiple definition of the symbol.
The portable way is rather to define int a[100] in only one source file, while all other source files just declare extern int a[100].
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #77 on: May 02, 2022, 01:29:31 pm »
In the old Z80 etc days, statics produced much faster code than stack based variables but I don't think that's true anymore.

z80 is just SO ANNOYING to program. It shouldn't be worse than 6502, but it pretty much is, because it's just so inconsistent.

On z80 some things you can only do with 8 bit registers. Some things you can only do with register pairs. It's super fast to push any register pair onto the stack (11 cycles) or pop it (10 cycles) (+4 for IX/XY in both cases, as usually, because of the extra byte of opcode). Spilling a register pair to static memory takes significantly longer, at 16 cycles for each of load&store for HL and 20 cycles each for other pairs. In 8 bit registers only A can be loaded/stored to a static location, and that takes 13 cycles -- plus 4 more to get it to/from where you really want it. For sequential accesses you can load/store A using (HL), (BC), or (DE) in 7 cycles, then increment/decrement the pointer in 6 cycles -- so no advantage over a static location. The same with load/store of B,C,D,E,H,L using (HL) only. The indirect load/store and inc/dec is only 2 bytes of code vs 3 for load/store to a static location (A only, remember), so there is that. But in general you should push/pop pairs whenever possible, and load/store pairs to static locations when not.

Access to something in the middle of the stack is just awful!! First of all, it's definitely 1 byte at a time. But there is no (SP+offset) addressing. There is (IX+$nn) and (IY+$nn) load/store addressing for all of A,B,C,D,E,H,L, but they're 19 cycles per byte! And you need to somehow get SP into IX or IY first. You can move HL,IX,IY *into* SP, but not the reverse. You can add SP to IX or IY in 15 cycles (11 for HL) but that means you need to zero them or get some other constant offset into them first. You can do LD IX,$nnnn in 14 cycles (or IY, or 10 for HL,BC,DE). You can do "XOR A;LD IXL,A;LD IXH,A" in 4+8+8=20, so that's a non-starter.

So to load BC with bytes from offsets 10 and 11 from SP you have a choice of "LD HL,$000A;ADD HL,SP;LD B,(HL);INC HL;LD C,(HL)" for 7 bytes and 41 cycles or "LD IX,$0000;ADD IX,SP;LD B,(IX,$0A);LD C,(IX,$0B)" for 12 bytes and 67 cycles.

On 6502 you can do the equivalent thing, transferring two bytes from offsets 10 and 11 in the (256 byte) hardware stack to two Zero Page locations (let's say 6&7) using "TSX;LDA $010A,X;STA $06;LDA $010B,X;STA $07" which is 11 bytes and 16 clock cycles -- and I didn't have to think at all about what is the best way to do it ... it's basically the obvious, only way.

If you're not using the very limited hardware stack, but making your own using a pair of Zero Page locations (let's say 8&9) then you'd have "LDY #$0A;LDA ($08),Y;STA $06;INY;LDA ($08),Y;STA $07" for (again) 11 bytes but this time 20 clock cycles (21 or 22 in the somewhat unlucky event one or both LDAs cross a page boundary with the indexing)

The z80 code has the advantage it can do up to 64k offset into the stack while the 6502 code only does up to a 255 byte offset. That would seldom be a factor.

The 6502 code has the advantage that you effectively have 256 8-bit registers, or 128 16-bit/pointer registers vs the z80's 11 8-bit registers or 5 16-bit/pointer registers.

Another example: add two 8 bit quantities and put the result in a 3rd:

6502: "CLC;LDA $05;ADC $06;STA $07" 7 bytes and 11 cycles

z80 #1: "LD A,B;ADD A,C;LD D,A" 3 bytes and 12 cycles. Very good!

z80 #2: "LD A,($0005);LD B,A;LD A,($0006);ADD A,B;LD ($0007),A" 11 bytes and 47 clock cycles. Ugh!

The z80 can have really fast and compact code if you manage to keep everything in its very limited register set. But if you run out and start having to load and store things to RAM then it gets pretty awful pretty quickly.
 
The following users thanked this post: newbrain, DiTBho

Online Siwastaja

  • Super Contributor
  • ***
  • Posts: 8112
  • Country: fi
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #78 on: May 02, 2022, 01:47:15 pm »
I don't get offended if someone just tells me I am wrong :) But I am not clever enough / have enough time, to read 1000 page compiler standards. Especially when the behaviour is so system dependent.

What the heck is "1000 page compiler standard"? And no, the behavior is not system dependent.

See, it's not that difficult:
https://www.google.com/search?q=wut+hapens+in+C+array+initialize+first+thing+only

First result:
https://stackoverflow.com/questions/42218928/why-does-this-initialize-the-first-element-only

The top accepted answer:
Quote
rest of array is initialized by default value, and for number types this default value is 0.

Did this "waste" a lot of your precious time?
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #79 on: May 02, 2022, 04:10:24 pm »
Half the stuff on stack overflow is dead ends, or code which doesn't actually work :)

Quote
z80 is just SO ANNOYING to program

Sure, but it was out in 1976. That's almost "half a century" ago :) Lots of people did amazing things with it. Even I did some good stuff with it :) I wrote literally megabytes (of binary) in Z80 asm.

The Z280 mostly solved the issues you mention by extending the register set and having a cache so most instructions took just 3 clocks. Zilog told me I was the first significant Z280 design-in in Europe. The stack relative addressing was done with

Code: [Select]
ld hl, stack_offset
add hl, sp
ld a, (hl) or for 16 bits
ld e, (hl)
inc hl
ld d, (hl)
etc

I find the 32F417 runs probably 100x faster than a Z80, but probably 99% of the time it isn't needed. The thing which created really big problems with the Z80 and others wasn't raw speed; it was the 64k addressing limit. The Z180/64180 largely solved that for code but not for data. Same with the Z280. That 64k limit crippled many products because Ethernet was basically impossible; the current 32F4 project is my first ever with ETH, and even that was possible only because it uses the ST libs (which somebody else implemented); it is too complex for me to understand in enough detail. Life has just become so much more complex :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #80 on: May 02, 2022, 06:32:53 pm »
Oh well. The Z80 was not that bad. Many people - in particular those closer to compiler design, which I think Bruce is - much prefer "orthogonal" instruction sets, because that's fewer exceptions (so easier to learn and remember) and they are easier to deal with when writing compilers. But yeah, it was still usable. Oh and it was just basically a 8080 with extensions and some improvements. Blame Intel for the instruction set. ;D
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #81 on: May 02, 2022, 06:45:38 pm »
That explains it then... v10 did this. I see no COMMON generated now.

I think GCC now defaults to no-common, which would explain it. That was already pointed out in other threads (but not about data sections specifically.)

Note that if you actually want some global variables never initialized or zero'ed upon startup, you can always put them explicitely in a dedicated, custom section when you declare them. That would be the safest way of doing it, instead of twisting linker scripts to make behavior not compliant with the standard.
« Last Edit: May 02, 2022, 10:12:56 pm by SiliconWizard »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #82 on: May 02, 2022, 10:05:08 pm »
Quote
z80 is just SO ANNOYING to program

Sure, but it was out in 1976. That's almost "half a century" ago :) Lots of people did amazing things with it.

Yup and 6502 was out in 1975, at 1/8th the price :-) ($25 vs $200) Zilog did drop the price over time and by 1980 when the ZX80 came out there wasn't much difference.

But by 1980 there was also the 8086, 68000, and 6809 to deal with... (at high prices at that point)

Quote
Code: [Select]
ld hl, stack_offset
add hl, sp
ld a, (hl) or for 16 bits
ld e, (hl)
inc hl
ld d, (hl)
etc

Yup, this sequence was established as the best one somewhere in the middle of my 2 AM post :-)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #83 on: May 02, 2022, 10:23:57 pm »
Oh well. The Z80 was not that bad. Many people - in particular those closer to compiler design, which I think Bruce is - much prefer "orthogonal" instruction sets, because that's fewer exceptions (so easier to learn and remember) and they are easier to deal with when writing compilers.

Compilers can be taught the quirks, and modern computers are fast enough that a compiler can afford to generate a few variations on what variable goes in what register/memory and what instructions are selected and evaluate the size/speed of each. And then no one even has to think about it again. Which is basically what happened with the -- equally annoying, especially in early versions -- x86 family.

Assembly language programmers have to consider the quirks every second of the day. Either that or (more realistically) adopt fixed idioms for common things, even if in any given situation they are probably leaving a lot of size/performance on the table compared to the best compiler generated code.

And compilers now are of course far better than they were in the 70s and early 80s, if you even had access to one then. The code produced by, say, Turbo Pascal 1.0 is pretty awful.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #84 on: May 03, 2022, 07:01:23 am »
The old compilers were not very clever. We used IAR; cost over GBP 1000 in 1985. Some of the code was appalling, especially runtimes which were written in C (most compilers were written in C) and with no attempt to optimise. For example sscanf can be dramatically optimised, but IAR didn't bother so it might take 100ms to read in a single precision float.

But, to be fair, most software in a given product does not have to run fast. But it all takes roughly the same time to write. So writing in say C saves a great deal of time.

ISRs and such were always coded in asm, when I was doing that stuff.

I still sell a Z180 based box and on that we used a Hitech C compiler; the famous Clyde Smith-Stubbs in Australia. It was pretty good and did various optimisations. That company sold out to Microchip many years ago and I believe the product line has now been killed off by Microchip.

I had Z80 compilers in 1980 for Fortran, Ada, Cobol, Pascal, Coral etc. The Ada or Coral ones were reportedly used on military contracts.
« Last Edit: May 03, 2022, 09:22:47 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #85 on: May 03, 2022, 08:52:17 am »
The old compilers were not very clever. We used IAR; cost over GBP 1000 in 1985. Some of the code was appalling, especially runtimes which were written in C (most compilers were written in C) and with no attempt to optimise. For example sscanf can be dramatically optimised, but IAR didn't bother so it might take 100ms to read in a single precision float.

But, to be fair, most software in a given product does not have to run fast. But it all takes roughly the same time to write. So writing in say C saves a great deal of time.

Most of the code doesn't have to run fast, but on a machine limited to 64 KB all of the code should be as compact as possible -- which is the reason it was popular at the time to write most of programs in some kind of interpreted byte-code (whether UCSD Pascal or something like Woz's "Sweet 16") or threaded code. Then only the speed-critical code needed to be written in assembly language. The size of the interpreter needed to be carefully considered if overall gains were to be made. That's where Sweet 16 was pretty good, with the interpreter taking only about 300 bytes and the code running about 10x slower than native.

I had a look today at z80 code from the current version of the SDCC compiler.

Code: [Select]
unsigned fib(unsigned n){
  if (n < 2) return n;
  return fib(n-1) + fib(n-2);
}

This produced:

Code: [Select]
      000000                         47 _fib::
      000000 EB               [ 4]   48         ex      de, hl
                                     49 ;fib.c:2: if (n < 2) return n;
      000001 7B               [ 4]   50         ld      a, e
      000002 D6 02            [ 7]   51         sub     a, #0x02
      000004 7A               [ 4]   52         ld      a, d
      000005 DE 00            [ 7]   53         sbc     a, #0x00
      000007 D8               [11]   54         ret     C
                                     55 ;fib.c:3: return fib(n-1) + fib(n-2);
      000008 6B               [ 4]   56         ld      l, e
      000009 62               [ 4]   57         ld      h, d
      00000A 2B               [ 6]   58         dec     hl
      00000B D5               [11]   59         push    de
      00000C CDr00r00         [17]   60         call    _fib
      00000F EB               [ 4]   61         ex      de, hl
      000010 D1               [10]   62         pop     de
      000011 1B               [ 6]   63         dec     de
      000012 1B               [ 6]   64         dec     de
      000013 EB               [ 4]   65         ex      de, hl
      000014 D5               [11]   66         push    de
      000015 CDr00r00         [17]   67         call    _fib
      000018 E1               [10]   68         pop     hl
      000019 19               [11]   69         add     hl, de
      00001A EB               [ 4]   70         ex      de, hl
                                     71 ;fib.c:4: }
      00001B C9               [10]   72         ret

That's .... not awful. I can do better, but it's not awful.  It's nice of the compiler to list the execution time of each instruction in brackets.

Note that the ABI is a 16 bit argument is passed in HL and 16 bit result returned in DE.

The part between the two recursive calls can obviously be improved by dropping the two "EX ED,HL":

Code: [Select]
pop hl
dec hl
dec hl
push de

A little more can be saved by pushing the once-decremented version of argument rather than the original argument, thus saving a DEC later on. But that's honestly about it.

Code: [Select]
_fib::
        ex      de, hl
;fib.c:2: if (n < 2) return n;
        ld      a, e
        sub     a, #0x02
        ld      a, d
        sbc     a, #0x00
        ret     C
;fib.c:3: return fib(n-1) + fib(n-2);
        ex      de, hl
        dec     hl
        push    hl
        call    _fib
        pop     hl
        dec     hl
        push    de
        call    _fib
        pop     hl
        add     hl, de
        ex      de, hl
;fib.c:4: }
        ret

I can't see anything else to improve, given the ABI. If argument and return value were both in HL then all three "EX DE,HL" could be dropped and "EX (SP),HL" used between the two recursive calls...

Code: [Select]
_fib::
        ld      a, l
        sub     a, #0x02
        ld      a, h
        sbc     a, #0x00
        ret     C
        dec     hl
        push    hl
        call    _fib
        ex     (sp), hl
        dec     hl
        call    _fib
        pop     de
        add     hl, de
        ret

The z80 completely kills the 6502 on code size on 16 bit code like in this function. Still a lot more clock cycles though. I haven't worked out exactly how many -- not enough to make up for the clock speed ratio I think, so z80 wins here.

The z80's 20 bytes of code here (31 in the compile-generated version) also completely kills x86_64, ARMv7, and RISC-V, all of which are around 50 bytes of code (±2 or so!) on this, at least out of gcc.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #86 on: May 03, 2022, 09:43:31 am »
Code size was however not a big issue on the Z80/Z180/Z280 because the IAR compiler supported a "large model" and you just paged-in any number of pages - I think up to 1MB total code size. The limit was that no function could be bigger than the page size. Typically the page size was 4k which left you 60k for the "base" code+RAM, so you could have 4k base code, 56k RAM, and 4k page size (each page banked-in when a function was called) giving you 56k RAM and 1MB code.

Of course 56k RAM is not enough for many modern apps e.g. ETH, USB, etc. I think the "embedded world" polarises between "16/32k RAM is plenty" and "56k RAM is nowhere near enough" :)

Also RAM used solely by ISRs could be banked-in/out by the ISR, so you could have loads of 16/32k buffers in RAM. Just not a contiguous area.
« Last Edit: May 03, 2022, 09:47:11 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #87 on: May 03, 2022, 05:40:40 pm »
And compilers now are of course far better than they were in the 70s and early 80s, if you even had access to one then. The code produced by, say, Turbo Pascal 1.0 is pretty awful.

Dunno about TP 1.0, but I used TP 3.0 on CP/M back then and it was fine. Sure the compiler was very simple and generated meh code, but it was perfectly usable.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #88 on: May 03, 2022, 10:06:17 pm »
And compilers now are of course far better than they were in the 70s and early 80s, if you even had access to one then. The code produced by, say, Turbo Pascal 1.0 is pretty awful.

Dunno about TP 1.0, but I used TP 3.0 on CP/M back then and it was fine. Sure the compiler was very simple and generated meh code, but it was perfectly usable.

I didn't say it wasn't usable, I used it a lot. I said the code produced was a lot worse than a human could do, if they had time. Slow and big. But of course massively faster than any interpreted language. A truly great product in its time. And cheap.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #89 on: May 04, 2022, 02:23:55 am »
And compilers now are of course far better than they were in the 70s and early 80s, if you even had access to one then. The code produced by, say, Turbo Pascal 1.0 is pretty awful.

Dunno about TP 1.0, but I used TP 3.0 on CP/M back then and it was fine. Sure the compiler was very simple and generated meh code, but it was perfectly usable.

I didn't say it wasn't usable, I used it a lot. I said the code produced was a lot worse than a human could do, if they had time.

The compiler was very simple. It all fitted with the editor within some 30KB or so. =) And yes, I did write a significant amount of Z80 assembly back then for when the compiler would just not cut it.

To be fair, the code produced by most compilers until about the 2000's was significantly worse than what could be done by a human directly in assembly, generally speaking. I've written some code in assembly still in the early 2000's on x86, for speedups of 2, 3, 4 times, with still very simple hand-written assembly, nothing really fancy. In 2022, I would have a hard time seriously beating a C compiler doing that, or it would take a lot of time and effort. Optimizing compilers have become very good in the last 20 years.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #90 on: June 20, 2022, 08:57:12 am »
Another Q on compiler "optimisation":

What are the rules for removing code, after a construct like

while (true)
{
 some code
}

which will obviously never execute.

Some of it can be quite subtle, and removal of one thing can lead to removal of everything it calls, and so on. The compiler must build a tree of all related code and work up that tree and if it finds a branch gone it then goes back down and removes all the others that are affected.

But it doesn't always seem to happen. A colleague is working on the same Cube IDE (32F417) project but on a linux machine (I use win7-64) and he's just had half his code go missing, just by commenting out one FreeRTOS task :) Presumably we have different compiler options somewhere...

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #91 on: June 20, 2022, 10:19:28 am »
Certainly the compiler can (and should!) remove unreachable code. And variables that are used only by unreachable code. And variables that are read but never set, or set but never read.

If a compiler removes code that surprises you, and the compiler is a current version of gcc or llvm, then there is a 99.999% chance that it is you that doesn't understand your program, not the compiler.
 
The following users thanked this post: Siwastaja

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #92 on: June 20, 2022, 07:01:23 pm »
Another Q on compiler "optimisation":

What are the rules for removing code, after a construct like

while (true)
{
 some code
}

which will obviously never execute.

Are you sure? ;D

Some of it can be quite subtle, and removal of one thing can lead to removal of everything it calls, and so on. The compiler must build a tree of all related code and work up that tree and if it finds a branch gone it then goes back down and removes all the others that are affected.

Any code that is statically analyzed as unreachable during compilation will just not yield emitted code from the compiler. That usually happens even at the first level of optimization.

Now for any function call that would be unreachable, the function call itself will not be emitted, but the code of the function itself may still remain, even if it's never called anywhere, as long as said function has external linkage (in other words, if it's not a static function - static-qualified functions that are not called in their compilation unit will get pruned, but compilers usually give you a warning about those anyway.)

As we already talked about, the code of functions that have external linkage and that are never called anywhere will get removed, not by the compiler, but by the linker, and *only* if you have set the corresponding options (which consist of instructing the compiler to put each function in a separate section, and instructing the linker to prune unused sections.) Otherwise, it'll remain in the final object code as dead beef.

OTOH, code removed by a compiler while it should NOT get removed (meaning it is called or there is an execution path that should execute it) should never happen. If it does, this is a compiler bug, and then just open a ticket.
« Last Edit: June 20, 2022, 07:03:26 pm by SiliconWizard »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #93 on: June 20, 2022, 07:19:41 pm »
OK; the key word is "statically", which is what got me into trouble last time, creating a function located by the linker at a given address and then creating a jump to that address, but the compiler obviously didn't realise what was going on, and on a static analysis the function was not called by anything. So I had to do some hacks to stop that function being removed. In the end all it took was a
dw function-name
statement in an assembler file, to prevent the removal. Doing it from C was difficult because the referring C code was also not called by anything so the whole lot was still getting removed ;) But assembler code is not removed.

How does one implement function tables (not sure of the right word) where you use an index to jump to one of a list of functions? I have never done this in C and normally use a case statement, but if you had lots of cases then a table makes sense. I used to do this extensively in assembler. Obvously the index needs to be range checked :)
« Last Edit: June 20, 2022, 07:57:14 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #94 on: June 20, 2022, 07:47:12 pm »
There are a number of ways of doing that.
If you just want to access functions through an index, declare an array of function pointers. Of course, that means that all functions you want to access this way should have the same prototype. Otherwise, I don't see the point or the feasability.

Then it can be something like: (assuming your functions have the prototype defined below, adapt to your use case:)
Code: [Select]
typedef int (*myFunctions_t)(int n, char *s);

int foo1(int n, char *) { (...) }
int foo2(int n, char *) { (...) }
(...)

myFunctions_t myFunctionTable[] = {
    foo1,
    foo2,
   (...)
};
And calling one of those:
Code: [Select]
int ret = myFunctionTable[index](n, s);
 
The following users thanked this post: peter-h

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #95 on: June 20, 2022, 08:23:22 pm »
I occasionally use an array of structures containing the name, optionally a type identifier, and a function pointer:
Code: [Select]
enum {
    FUNCTYPE_NONE = 0,
    FUNCTYPE_FLOAT_UNARY,  /* Returns a float, takes one float as an argument */
    FUNCTYPE_FLOAT_BINARY, /* Returns a float, takes two floats as arguments */
};

typedef struct {
    const char  *name;
    int    type;
    union {
        float (*float_unary)(float);          /* .type = FUNCTYPE_FLOAT_UNARY */
        float (*float_binary)(float, float);  /* .type = FUNCTYPE_FLOAT_BINARY */
    };
} function_descriptor;

const function_descriptor  func[] =
{
    { .name = "sin", .type = FUNCTYPE_FLOAT_UNARY, .float_unary = sinf },
    /* Other functions omitted */
    { .name = NULL, .type = FUNCTYPE_NONE }
};
#define  funcs  ((sizeof func / sizeof func[0]) - 1)
Valid indexes are 0 through funcs-1, inclusive; or you can loop until .name==NULL or .type==FUNCTYPE_NONE.

This is useful when one needs to execute a function when given its name as a string:
Code: [Select]
int  exec_float_unary(const char *name, float *result, float arg);
int  exec_float_binary(const char *name, float *result, float leftarg, float rightarg);
where the return value is 0 if successful and nonzero for error codes, result points to where the result is stored, and arg, leftarg, and rightarg are arguments passed to the function.

(It is pretty obvious that the most recent case I used this at was a calculator/expression evaluator...)

In case of an extensible calculator/expression evaluator -type thingy on a fully featured OS, I like to use
Code: [Select]
static size_t               funcs_max = 0;
static size_t               funcs = 0;
static function_descriptor *func = NULL;

int register_float_unary(const char *name, float (*func)(float));
int register_float_binary(const char *name, float (*func)(float, float));
so that plugins can provide new functions ELF-magically using
Code: [Select]
__attribute__ ((__constructor__))
static void register_functions(void)
{
    register_float_unary("sin", sinf);
    register_float_unary("cos", cosf);
    register_float_unary("tan", tanf);
    register_float_binary("atan2", atan2f);
}
where the constructor function attribute causes the linker to add the address of that function into an ELF section, so that either the startup code (if static) or dynamic linker (if dynamic) will execute it when the ELF object is loaded.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #96 on: August 21, 2022, 03:52:02 pm »
How about this one. Just wasted a bit of time on it.

struct fred joe = {0};

converts into a memset(&joe,0,sizeof(joe)); or some such.

This happens even if stdlib or whatever is not #included.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #97 on: August 21, 2022, 04:20:13 pm »
How about this one. Just wasted a bit of time on it.

struct fred joe = {0};

converts into a memset(&joe,0,sizeof(joe)); or some such.

This happens even if stdlib or whatever is not #included.
Yep, I've mentioned this in other threads, when discussing freestanding environments.

It is documented by GCC:
Quote
Most of the compiler support routines used by GCC are present in libgcc, but there are a few exceptions. GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp. Finally, if __builtin_trap is used, and the target does not implement the trap pattern, then GCC emits a call to abort.

I added the links, to the Linux man-pages project, but don't let the Linux in the name distract you.  I use them as a C reference because they're actively maintained by Michael Kerrisk, and do describe (in the Conforming to sections) which standards define them, plus the possible Notes and Bugs sections often include useful information not mentioned elsewhere.

Compiler support routines include things like the __udivdi3 (on 32-bit systems for 64-bit integer division).
« Last Edit: August 21, 2022, 04:21:44 pm by Nominal Animal »
 
The following users thanked this post: peter-h

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #98 on: August 21, 2022, 10:08:54 pm »
This is quite a gotcha for embedded work.

I know it was pointed out here in the past but it didn't quite sink in that this is hard to avoid. I think zero optimisation (-O0) might work though. I had a look through some of my code an I used that as an attrib on a function which was copying data with a loop, and the compiler discovered that the loop could be done with memcpy, at least partially.

It might mean compiling the whole of a project, or a module, with optimisation set to zero.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #99 on: August 21, 2022, 10:28:19 pm »
This is quite a gotcha for embedded work.

I know it was pointed out here in the past but it didn't quite sink in that this is hard to avoid. I think zero optimisation (-O0) might work though. I had a look through some of my code an I used that as an attrib on a function which was copying data with a loop, and the compiler discovered that the loop could be done with memcpy, at least partially.

It might mean compiling the whole of a project, or a module, with optimisation set to zero.

What on earth are you trying to achieve here?

You're willing to kill the size and speed of your entire program in order to avoid have a 10 or 20 byte memset() function in it?
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #100 on: August 22, 2022, 05:32:43 am »
The functions are nothing special, and it is enough to just implement them (memcpy, memmove, memset, memcmp, and optionally abort) yourself.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #101 on: August 22, 2022, 06:11:07 am »
I have done that for some of them, where I was calling them explicitly (with a prefix on the name for clarity). The challenge is this: in the main code, if the compiler invokes e.g. memset, it can use the stdlib one because it is available. But in the boot block one doesn't have stdlib so that substitution needs to be blocked or the function is provided locally.

If say the boot block is a single file and I put a static function in there called memset will the compiler use that one, even if it is not explicitly referenced? The compiler normally warns that function x is not used and strips it out. How do you avoid that?

It would be good if the .map file cross ref table showed where e.g. memcpy is used, but it doesn't.
« Last Edit: August 22, 2022, 06:43:36 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #102 on: August 22, 2022, 09:09:08 am »
libc functions are weak linked, so if you have ANY function with the same name yourself it will use yours in preference.

If you don't care about speed you can just use a simple loop (or REP MOVSB etc, where available), make memcpy and memmove the same function, etc. (though I believe the compiler only calls memcpy and makes sure there is no overlap)
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #103 on: August 22, 2022, 09:50:12 am »
If say the boot block is a single file and I put a static function in there called memset will the compiler use that one, even if it is not explicitly referenced?
Yes, because even though the compiler itself generates the call, it is as if the structure assignment (or whatever code is being replaced) source code was replaced with the actual function call.

The compiler normally warns that function x is not used and strips it out. How do you avoid that?
When the compiler does use the function, it will not complain nor strip the implementation out, because it can see the use case.

It would be good if the .map file cross ref table showed where e.g. memcpy is used, but it doesn't.
You can see the symbol reference in the object file, though, using either readelf or objdump.

(though I believe the compiler only calls memcpy and makes sure there is no overlap)
I haven't explored all the cases where the compiler generates these calls, but a suggest a slightly different approach:

Implement the memory copy loop in two different functions, one that does the loop in increasing addresses, and the other in decreasing addresses.  memcpy() can always call the first one.  memmove() uses the increasing address one if source address is above the destination address, and the decreasing address one if source address is below the destination address.  You can use either one when the source and destination addresses match.  (Since memmove must act as if the data was first copied to a temporary array, and then copied back, we really shouldn't try and optimize the source==destination case out.)

The increasing address copy can then be used as memrepeat(data+1, data, (sizeof data)-(sizeof data[0])) to fill the array data with the contents of its first element.  Especially useful when the elements are structures or unions.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #104 on: August 22, 2022, 11:21:37 am »
(though I believe the compiler only calls memcpy and makes sure there is no overlap)
I haven't explored all the cases where the compiler generates these calls, but a suggest a slightly different approach:

Implement the memory copy loop in two different functions, one that does the loop in increasing addresses, and the other in decreasing addresses.  memcpy() can always call the first one.  memmove() uses the increasing address one if source address is above the destination address, and the decreasing address one if source address is below the destination address.

That's not a different approach, it's the same approach.

Except memcpy() should be whichever of increasing or decreasing addresses is the fastest, and memmove() either uses the slower one or tail-calls memcpy when it can. Often a decreasing loop is the fastest.

You *could* if code space is really really tight have memset() poke the constant into the first/last bytes of the memory range and then call memcpy() with an overlapping range. However this is a pretty slow way to implement memset(), and probably setting up the call to memcpy() is just as much code as the (twice as fast) loop to just write the same data from a register repeatedly. False savings.
 

Online DiTBho

  • Super Contributor
  • ***
  • Posts: 3801
  • Country: gb
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #105 on: August 22, 2022, 11:36:48 am »
Basically this IDE stuff is a tool which got put together by real men, for real men

for Homo real Neanderthalensis beings for real Neanderthals, you mean.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #106 on: August 22, 2022, 12:47:48 pm »
(though I believe the compiler only calls memcpy and makes sure there is no overlap)
I haven't explored all the cases where the compiler generates these calls, but a suggest a slightly different approach:

Implement the memory copy loop in two different functions, one that does the loop in increasing addresses, and the other in decreasing addresses.  memcpy() can always call the first one.  memmove() uses the increasing address one if source address is above the destination address, and the decreasing address one if source address is below the destination address.
That's not a different approach, it's the same approach.
Okay.  I guess I got hung up on the mention of same function and avoid overlap, since I use a pair of functions that differ only in the loop direction.

Except memcpy() should be whichever of increasing or decreasing addresses is the fastest, and memmove() either uses the slower one or tail-calls memcpy when it can. Often a decreasing loop is the fastest.
True.

You *could* if code space is really really tight have memset() poke the constant into the first/last bytes of the memory range and then call memcpy() with an overlapping range. However this is a pretty slow way to implement memset(), and probably setting up the call to memcpy() is just as much code as the (twice as fast) loop to just write the same data from a register repeatedly. False savings.
Fully agreed.

(While memrepeat() does exactly that, it is only useful when the data to be filled is composite, a structure or union, or possibly a floating-point value on architectures without floating-point registers, and larger than a machine register in size.  memset() is easily implemented more efficiently than that, and is common enough to spend the dozen or two bytes for its implementation.)

Even things like checking whether the range is aligned or not, and doing the loop using native word size elements (with the fill byte duplicated across the bytes in that word) is not usually worth it at run time, generally speaking, for any of these functions.  Using C11 _Alignof (or the earlier GCC/clang/ICC __alignof__ operator) one can create an inline wrapper that can select between an optimized (native word alignment and size) or a byte-per-byte version, which may be useful on some architectures; but then that wrapper is visible in the header file and linkage won't be to memset()/memcpy()/memmove()/memcmp() but to the optimized or per-byte version instead.

Basically this IDE stuff is a tool which got put together by real men, for real men
for Homo real Neanderthalensis beings for real Neanderthals, you mean.
Dude, some of my distant ancestors were Neanderthals, and they had bigger brains than we do.  The ooga-booga-cavemen is a poor stereotype.  Get an expert in them drunk, and they'll admit that everything points out to them having been more intelligent than us –– heck, that people who lived a few thousand years ago were not only more intelligent than us but also with bodies closer to olympic athletes than the potato sacks we are ––, which itself is sufficient incentive for most humans to paint them ugly.
If you exclude childhood deaths, even their life expectancy was similar to ours, and much longer than most humans during the agricultural era.  Yes, they had hard lives in that most male skeletons found show signs of old fractures having fully healed, but that also shows that such injuries were dealt with and not fatal.
 

Online DiTBho

  • Super Contributor
  • ***
  • Posts: 3801
  • Country: gb
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #107 on: August 22, 2022, 01:31:33 pm »
Any code that is statically analyzed as unreachable during compilation will just not yield emitted code from the compiler. That usually happens even at the first level of optimization.

In avionic there is a precise activity (someone paid for) to check dead (unreachable) code.
It's boring and annoying, but it's a task to be done, so I developed a tool to help people check it on C.
Then recycled the project and merged into myC.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online DiTBho

  • Super Contributor
  • ***
  • Posts: 3801
  • Country: gb
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #108 on: August 22, 2022, 01:39:25 pm »
z80 is just SO ANNOYING to program. It shouldn't be worse than 6502, but it pretty much is, because it's just so inconsistent.

if you look at SmartC (on 90s Byte Magazine), or SDCC, well ... it's full of similar comments  ;D
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online DiTBho

  • Super Contributor
  • ***
  • Posts: 3801
  • Country: gb
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #109 on: August 22, 2022, 01:51:40 pm »
The z80 can have really fast and compact code if you manage to keep everything in its very limited register set. But if you run out and start having to load and store things to RAM then it gets pretty awful pretty quickly.

that's the same with 68hc11, just a little mitigated by the gcc-v3.4.6 trick of using internal ram as registers, but you still have push, pop, etc.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #110 on: August 22, 2022, 02:15:53 pm »
(though I believe the compiler only calls memcpy and makes sure there is no overlap)
I haven't explored all the cases where the compiler generates these calls, but a suggest a slightly different approach:

Implement the memory copy loop in two different functions, one that does the loop in increasing addresses, and the other in decreasing addresses.  memcpy() can always call the first one.  memmove() uses the increasing address one if source address is above the destination address, and the decreasing address one if source address is below the destination address.
That's not a different approach, it's the same approach.
Okay.  I guess I got hung up on the mention of same function and avoid overlap, since I use a pair of functions that differ only in the loop direction.

Ideally you write memcpy() and memmove() in assembly language as a single function with two entry points just a couple of instructions apart:

Code: [Select]
memmove: # a0,a1,a2 = dst,src,len
    add a2,a2,a1 # 1 past end of src
    blt a0,a1,memcpy$1
    bge a0,a2,memcpy$1
    j slow_copy
memcpy:
    add a2,a2,a1 # 1 past end of src
memcpy$1:
    :

Well, that's if there's a significant speed difference so you want to do memcpy() any time it's actually possible. If they're the same speed then just a simple test will do:

Code: [Select]
memmove: # a0,a1,a2 = dst,src,len
    bge a0,a1,copy_downwards
memcpy:
    # copy upwards
    :

Quote
Even things like checking whether the range is aligned or not, and doing the loop using native word size elements (with the fill byte duplicated across the bytes in that word) is not usually worth it at run time, generally speaking, for any of these functions.

If most copies are either very small (less than 16 bytes, say), or else larger than L2 cache (e.g. 256k or 8M or something like this) then it may be faster or at least as fast to just do a byte by byte copy. But for things from a few hundred bytes to a few hundred KB it's probably worth taking a few instructions to figure out the best tactic. If you have plenty of code space available.

I REALLY LOVE that pretty soon on most RISC-V CPUs you won't have to care and can just unconditionally do memcpy() as:

Code: [Select]
memcpy:
     mv      a3,a0
1:
     vsetvli a4,a2,e8,m4
     vle8.v  v0,(a1)
     add     a1,a1,a4
     sub     a2,a2,a4
     vse8.v  v0,(a3)
     add     a3,a3,a4
     bnez    a2,1b
     ret

... and that's going to be optimal for any size or alignment.

On the Allwinner D1 at 1 GHz, that code takes a constant 31 ns for any size from 0 to 64 bytes copied. The standard Debian glibc memcpy() takes from 50 ns for 0 bytes copied up to 112 ns for 64 bytes copied.

Similar applies to ARMv9, of course.
 
The following users thanked this post: Nominal Animal

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #111 on: August 22, 2022, 02:52:31 pm »
Quote
libc functions are weak linked

Except in Cube libc.a they are not weak :) (I solved that with objcopy -weaken...)

The other funny thing is that your own versions of the functions need the -O0 attribute otherwise the compiler will again replace them with the stdlib ones, won't it?

Doing them in asm is the safe thing.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online eutectique

  • Frequent Contributor
  • **
  • Posts: 369
  • Country: be
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #112 on: August 22, 2022, 03:25:26 pm »
I have shown you that this is not the case: https://www.eevblog.com/forum/microcontrollers/is-st-cube-ide-a-piece-of-buggy-crap/msg4363990/#msg4363990

If you provide your implementation of a library function, it will be linked in. Regardless of optimisation flags. I've done it without weakening the symbols, which would be a terrible hack, IMHO.

Perhaps, the order of objects in your linker command is wrong?
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #113 on: August 22, 2022, 03:31:21 pm »
Quote
libc functions are weak linked

Except in Cube libc.a they are not weak :) (I solved that with objcopy -weaken...)

The other funny thing is that your own versions of the functions need the -O0 attribute otherwise the compiler will again replace them with the stdlib ones, won't it?

Yup, that's easy to get with any simple copying or memset-like loop.  I think I've only seen it with -O2 or -O3 on GCC, and NOT -O1 or even -Os.

Anyway, in GCC you can disable it with "-fno-tree-loop-distribute-pattern" added to CFLAGS. Or I guess with a pragma in the code.

Code: [Select]
#pragma GCC push_options
# pragma GCC optimize ("no-tree-loop-distribute-patterns")
:
:
#pragma GCC pop_options

Also, putting asm volatile {""} somewhere inside the loop should disable the optimisation on any compiler.

Quote
Doing them in asm is the safe thing.

yup.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #114 on: August 22, 2022, 03:35:21 pm »
Doing them in asm is the safe thing.
Yes, I too agree.  I usually use extended asm, which works in GCC and clang (and Intel CC on x86/x86-64).

I would not worry overmuch optimizing them for speed, either; they're not that critical.

It does annoy me that there aren't memcpyi()/memmovei()/memseti()/memcmpi(), memcpyl()/memmovel()/memsetl()/memcmpl(), and memcpyll()/memmovell()/memsetll()/memcmpll() variants for the cases when the compiler knows at compile time that both pointers and the length are aligned to a multiple of int, long, or long long.  It would be trivial for e.g. memcpyi(), memcpyl(), and memcpyll() to be weak aliases for memcpy(), so the default cost would be a few symbol aliases in the symbol table!

(I do believe many libraries already use an ELF resolver via the ifunc function attribute, so that they can select the best variants at runtime based on CPUID on x86-64, which traditionally has fluctuated between REP MOVSB being recommended or recommended against, depending on the exact processor.)

As it is, creating optimized (aligned pointers and length a multiple of said alignment) versions can be done, but the compiler will still always use the generic memcpy()/memmove()/memset()/memcmp() ones, so it is just as well to use a completely different name or interface to the optimized/aligned "versions" of those, since only the explicit calls by us human programmers will ever use them anyway.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #115 on: August 22, 2022, 05:57:03 pm »
Compilers will inline code for memcpy() and memmove() in a number of cases already and do so relatively cleverly. But for cases where they need to *call* the functions, yes this is often suboptimal.
That reminds me of a different discussion where I considered memory copy to be in general suboptimal for many programming languages and down to the CPUs themselves which often have very limited (or even bad) block copy functionalities. Some people spend generous amounts of time trying to optimize memory copy for their specific cases on specific targets and benchmarks show sometimes wide differences, so that's not a secondary problem IMHO.

That said, back to the memcpy() issues mentioned earlier, I guess this will be obvious to most here, but I've seen quite a few people not knowing the difference between memcpy() and memmove() - and often not even knowing the latter - and introducing nice bugs due to this.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #116 on: August 22, 2022, 06:57:44 pm »
Quote
Compilers will inline code for memcpy() and memmove() in a number of cases already and do so relatively cleverly. But for cases where they need to *call* the functions, yes this is often suboptimal.

The former is fine but the latter is a bastard if you don't catch it, and you don't have stdlib in your project.

I have made a note in my doc for the product I am working on to check

Quote
You can see the symbol reference in the object file, though, using either readelf or objdump.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #117 on: August 22, 2022, 07:51:39 pm »
Uh?

Maybe we should stick to programming. Just a thought. ::)
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #118 on: August 23, 2022, 01:29:50 am »
Uh?

Maybe we should stick to programming. Just a thought. ::)

I think he meant d11n.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #119 on: August 23, 2022, 02:09:52 am »
Uh?

Maybe we should stick to programming. Just a thought. ::)

I think he meant d11n.

That was in reply to a post that since disappeared, so maybe it would be best for the thread if we deleted our last 3 posts as well. ;D
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #120 on: November 04, 2022, 10:00:42 pm »
Back on this topic :)

Does anyone know under which conditions ARM32 GCC (currently v10) removes (or doesn't remove) unused/unreachable code?

I mean individual functions which are not called by anything, and are not pointed to by a function table.

AFAICT they are all getting removed. This is fine.

But I am working in ST Cube IDE (which is basically a makefile generator, with an editor, and 100x more features than I know how to configure :) ) and in the project I have tons of ST "HAL" libs which, if compiled, would really bloat the project.

There is a different scenario where a precompiled object file (.o) or a library (.a) gets removed if none of the functions within the module is being referenced, and this is much less granular that removing unreachable code during compilation. I think that is because it is being done by the linker, which cannot strip out a function out of a .o or .a file.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #121 on: November 04, 2022, 10:19:13 pm »
There is a different scenario where a precompiled object file (.o) or a library (.a) gets removed if none of the functions within the module is being referenced, and this is much less granular that removing unreachable code during compilation. I think that is because it is being done by the linker, which cannot strip out a function out of a .o or .a file.
If you use function-sections when compiling, then the linker can do it; just tell the linker to (garbage collect sections) gc-section.  It does exactly what you want, at function granularity.

When compiled, each function gets put into a section named .text.function_name.  During linking, the linker will examine which function symbols are reachable starting from the ELF entry point (includes both function calls and taking the address of a function; it does this by examining the symbol references for each function).  All functions that might get called, get mapped to the common .text section, and the rest of the functions discarded.

See for example this eLinux.org page from 2011.  (Also, Teensyduino (Arduino add-on for Teensy microcontrollers from PJRC) uses this by default, so I do believe others use it extensively too.)

For GCC, the options needed are -ffunction-sections (during compiling) and -Wl,--gc-sections (during linking; often -Wl,--gc-sections,--relax is used).
« Last Edit: November 04, 2022, 10:21:28 pm by Nominal Animal »
 

Online DiTBho

  • Super Contributor
  • ***
  • Posts: 3801
  • Country: gb
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #122 on: November 04, 2022, 10:25:47 pm »
ummm, dead code should be detected independently of the compiler.
It is a tool that I have developed for both C and my-C.
Stood and Understand can also catch dead code.
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #123 on: November 04, 2022, 10:30:10 pm »
There is a different scenario where a precompiled object file (.o) or a library (.a) gets removed if none of the functions within the module is being referenced, and this is much less granular that removing unreachable code during compilation. I think that is because it is being done by the linker, which cannot strip out a function out of a .o or .a file.
If you use function-sections when compiling, then the linker can do it; just tell the linker to (garbage collect sections) gc-section.  It does exactly what you want, at function granularity.

When compiled, each function gets put into a section named .text.function_name.  During linking, the linker will examine which function symbols are reachable starting from the ELF entry point (includes both function calls and taking the address of a function; it does this by examining the symbol references for each function).  All functions that might get called, get mapped to the common .text section, and the rest of the functions discarded.

See for example this eLinux.org page from 2011.  (Also, Teensyduino (Arduino add-on for Teensy microcontrollers from PJRC) uses this by default, so I do believe others use it extensively too.)

For GCC, the options needed are -ffunction-sections (during compiling) and -Wl,--gc-sections (during linking; often -Wl,--gc-sections,--relax is used).

Yep.

I find it unfortunate that we have to put functions in individual sections (which makes the object files bigger, not that it is a huge deal, but yeah) to get this behavior. I can't really find a rationale for not making it the default behavior of the linker.

One question I have (I admit I don't use these options very often actually) is that, what happens if some function is never called directly in the code, but passed as a function pointer somewhere and called indirectly? Is the linker clever enough not to prune this function in this case? I suppose that taking a pointer to it should be enough to determine that said function is not dead code, but just wondering. Too lazy right now to test it. ;D
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #124 on: November 04, 2022, 11:06:11 pm »
Quote
what happens if some function is never called directly in the code, but passed as a function pointer somewhere and called indirectly? Is the linker clever enough not to prune this function in this case? I suppose that taking a pointer to it should be enough to determine that said function is not dead code, but just wondering. Too lazy right now to test it. ;D

I had this once (e.g. RAM based code which is jumped to) and there were very few ways to make it not disappear. I did it by adding the entry point (function) address to a table of words in the .s (assembler) startup file. That works perfectly.

I am more interested in the compiler removing unused code, and I think this is default behaviour.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #125 on: November 04, 2022, 11:35:48 pm »
I find it unfortunate that we have to put functions in individual sections (which makes the object files bigger, not that it is a huge deal, but yeah) to get this behavior. I can't really find a rationale for not making it the default behavior of the linker.

The tradition in *nix is that the object file formats and the linker have no concept at all of body of a function, but only entry points. A logical function can jump around arbitrarily within the section, can have multiple entry points, and so forth.

Consider this:

Code: [Select]
00000000000101e8 <isEven>:
   101e8:       c119                    beqz    a0,101ee <isEven+0x6>
   101ea:       357d                    addiw   a0,a0,-1
   101ec:       a019                    j       101f2 <isOdd>
   101ee:       4505                    li      a0,1
   101f0:       8082                    ret

00000000000101f2 <isOdd>:
   101f2:       c119                    beqz    a0,101f8 <isOdd+0x6>
   101f4:       357d                    addiw   a0,a0,-1
   101f6:       bfcd                    j       101e8 <isEven>
   101f8:       8082                    ret

Is that two functions, or one function with two entry points?

Source, compiled with just -Os...

Code: [Select]
typedef unsigned int uint;

int isOdd(uint n);
int isEven(uint n);

int isOdd(uint n) {
  return n == 0 ? 0 : isEven(n - 1);;
}

int isEven(uint n) {
  return n == 0 ? 1 : isOdd(n - 1);;
}
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #126 on: November 04, 2022, 11:58:15 pm »
One question I have (I admit I don't use these options very often actually) is that, what happens if some function is never called directly in the code, but passed as a function pointer somewhere and called indirectly? Is the linker clever enough not to prune this function in this case? I suppose that taking a pointer to it should be enough to determine that said function is not dead code, but just wondering.
Yes, the linker is clever enough not to prune a function which is only called via a function pointer.

This is because of how the linker actually does this.  Because each function is in their own section, it examines the symbols referred to in each section.  It does not matter if the function symbol is called, or the address of the function is taken, because both cause the function symbol to be added to the symbol table the same way.  Then, the linker creates a disjoint-set data structure of all sections, doing a Join between a pair of sections whenever a symbol in one section is used in another section.  The linker keeps all sections that belong to the same set as the ELF start address or start function is in, and discards all others.

(Note: I didn't actually check that this is exactly what the GNU linker is doing; this is based on the documentation and observable behaviour, and how I'd implement it.  Using a graph instead of a disjoint-set would give the exact same results, but would be less efficient.)

(Edited: I checked.  See binutils/ld/ldlang.c:lang_gc_sections() and binutils/ld/ldlang.c:lang_end(), as well as other code using link_info.gc_sections.  It uses hash lookups, implementing the logic above, but in a different manner.)

The tradition in *nix is that the object file formats and the linker have no concept at all of body of a function, but only entry points. A logical function can jump around arbitrarily within the section, can have multiple entry points, and so forth.
Yes, and the compiler can still generate the exact same code even with -ffunction-sections -Wl,--gc-sections, because the jump address in the ELF file is a symbol table reference, not a plain numeric address.
In other words, the two will always contain a symbolic reference to the other, so they're always either both included or both excluded.

The fact that the jump is to a different section does not matter here.  What isn't guaranteed, is that the two functions will reside at nearby addresses in the final binary.
« Last Edit: November 05, 2022, 12:14:41 am by Nominal Animal »
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #127 on: November 05, 2022, 12:32:47 am »
The tradition in *nix is that the object file formats and the linker have no concept at all of body of a function, but only entry points. A logical function can jump around arbitrarily within the section, can have multiple entry points, and so forth.
Yes, and the compiler can still generate the exact same code even with -ffunction-sections -Wl,--gc-sections, because the jump address in the ELF file is a symbol table reference, not a plain numeric address.
In other words, the two will always contain a symbolic reference to the other, so they're always either both included or both excluded.

The fact that the jump is to a different section does not matter here.  What isn't guaranteed, is that the two functions will reside at nearby addresses in the final binary.

The point is not that sections can be put next to each other. The point is that a section can not be subdivided by the linker. If a compiler or programmer puts several logically distinct things in the same section then they are inextricably joined together forever and included or excluded as a whole.

This is different to how traditional Mac or Windows object file formats work, where every top level data item or function is always its own section.
 
The following users thanked this post: Nominal Animal

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #128 on: November 05, 2022, 02:23:09 am »
The point is not that sections can be put next to each other. The point is that a section can not be subdivided by the linker. If a compiler or programmer puts several logically distinct things in the same section then they are inextricably joined together forever and included or excluded as a whole.

This is different to how traditional Mac or Windows object file formats work, where every top level data item or function is always its own section.
Right, I misunderstood your point.  I was just trying to say that while function-sections does increase the ELF object file size, it shouldn't –– technically, does not need to –– affect the generated code, or increase the final binary size, at all.  This is just how ELF format files can be used to do this thing.
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #129 on: November 05, 2022, 02:43:44 am »
Yep I understand that. I just find the choice they made a bit odd, but that's not the sole oddity that I find in Unix stuff. Everything has a baggage. ;D
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #130 on: November 05, 2022, 07:04:39 am »
Does GCC v10 remove unused code at compile-time, as a default?
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #131 on: November 05, 2022, 07:16:38 am »
Depends on how specifically it is "unused". Things like
Code: [Select]
int foo()
{
  if (1)
    return 1;
  else
    return 123;
}
would be optimized away at any optimization level by default.

At -O1 unused static functions would be removed even without garbage collection, they won't even get to the linking stage.
Alex
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #132 on: November 05, 2022, 08:21:45 am »
I am referring (see earlier post) to the vast amount of code in the ST "HAL" code which comes with ST Cube IDE.

For example you get a file containing a dozen "SPI" functions. Some polled, some _IT (interrupt), some _DMA (for DMA). Possibly none of these get called.

I can do a test of project size, with a function having #if 0 / # endif around it, and it seems these do get removed.

This is the sort of stuff which is 99% unused



Experimentally, if I exclude a .c file from the build, and it isn't used e.g. the ...i2c.c file, then the FLASH code size does not change. I wonder what rules control this.

I am using -Og. I am fairly sure -O0 does the same otherwise compiling with that (which I have tried) would massively bloat the project. It increases code size about 20-30%. But maybe that 20-30% is that the whole ST lib is in there?

However I have just compiled with -O0 and the size went up as expected from 351k to 502k. A quick look at some random unused function finds it in the .map file!



So, yeah, -O0 does not remove unreachable code!

And I have that checkbox ticked



but probably not the required linker option.
« Last Edit: November 05, 2022, 08:45:46 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online DiTBho

  • Super Contributor
  • ***
  • Posts: 3801
  • Country: gb
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #133 on: November 05, 2022, 12:00:35 pm »
(for me it's all wrong approach)
The opposite of courage is not cowardice, it is conformity. Even a dead fish can go with the flow
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #134 on: November 05, 2022, 12:04:17 pm »
Not everybody is as clever as you, especially not me :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #135 on: November 05, 2022, 04:22:43 pm »
IRQ handlers and their dependencies are not "unused", Pointers to them are used in the vector table, so the compiler has to include them. Those things will not be removed under any optimization options, since the compiler does not know under which conditions those pointers are used.
Alex
 
The following users thanked this post: DiTBho

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #136 on: November 05, 2022, 04:58:06 pm »
If you exclude them you will get a linker error, which I was not getting.

My conclusion is that -O0 includes all sources loaded into the Cube editor structure.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #137 on: November 05, 2022, 05:10:38 pm »
Well, yes, -O0 would include all the code. As I said, the only things that may get optimized at -O0 is the dead code within the function itself and some expression folding might happen too.

That's the difference between optimization levels.

I really don't get what is your end goal here.
« Last Edit: November 05, 2022, 05:12:22 pm by ataradov »
Alex
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #138 on: November 05, 2022, 05:52:17 pm »
I just want to be sure that unwanted code is really being excluded.

The project was set up by someone else years ago, with a ton of libraries, and bits of these are used around the place, but probably less than 10% of them. And I don't want to go around putting

#if 0
#endif

around all that stuff.

I might also need more of it one day.

And while I do all my code "bar metal" (because I want to understand what I am doing) the HAL etc stuff is useful for reference, so I don't want to just delete the files.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #139 on: November 05, 2022, 05:56:13 pm »
Just enable optimization and compiler/linker will remove that for you, you will go nuts removing that stuff by hand.
Alex
 
The following users thanked this post: peter-h

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #140 on: November 05, 2022, 08:28:16 pm »
Does GCC v10 remove unused code at compile-time, as a default?

It does remove unused code that is statically (at compile-time) detectable as unused in a single compilation unit. For code inside a function, yes. As long as optimizations are enabled. For full functions, yes if they are defined static, no otherwise (which is why you need the above sections 'trick' to get rid of this kind of dead code.)

Most modern C compilers will act pretty much the same. MSVC may be a bit different as it uses a different object model and may be able to prune unused code across compilation units without requiring any specific options (as we explained earlier.)
« Last Edit: November 05, 2022, 08:30:21 pm by SiliconWizard »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #141 on: November 05, 2022, 08:58:42 pm »
I do have the "place functions in their own sections" enabled as shown above, but nothing else.

I am not concerned about unreachable code within a function; hopefully none of my code has that :) And anyway the compiler tends to warn about that.

I just want it to reliably remove entire unused functions; that removes the vast majority of the ST libraries.

Some functions are called via function tables. This is normal in USB and ETH code. But these will be correctly preserved because the function name (its address) is in that table.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #142 on: November 05, 2022, 09:03:27 pm »
Then the "place functions in their own sections" option will do the job. But I don't know how it translates as far as compile and link option go. If it only acts on the compiling options, that's not enough. The linker also needs to get the right option, and I don't know if the option you mention (which I'm assuming is a check box in some IDE) does activate both options.

You can additionally use the same kind of option for data, so that the linker will prune any data that is unused. Otherwise you may still have unused data in the final binary. I don't know how much "data" the ST libraries use and if that will make a difference, but that's worth a try.
« Last Edit: November 05, 2022, 09:15:00 pm by SiliconWizard »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #143 on: November 05, 2022, 09:14:59 pm »
There should not be much RAM usage because I have spent half my life over past 2 years checking that - the 32F417 doesn't have a whole load left by the time you have MbedTLS in there :)

Curiously, I have found there is a "master HAL module enable .h file" with this

Code: [Select]
/* ########################## Module Selection ############################## */
/**
  * @brief This is the list of modules to be used in the HAL driver
  */
#define HAL_MODULE_ENABLED 
#define HAL_ADC_MODULE_ENABLED
/* #define HAL_CAN_MODULE_ENABLED */
/* #define HAL_CAN_LEGACY_MODULE_ENABLED */ 
/* #define HAL_CRC_MODULE_ENABLED */ 
/* #define HAL_CRYP_MODULE_ENABLED */ 
#define HAL_DAC_MODULE_ENABLED
/* #define HAL_DCMI_MODULE_ENABLED */
#define HAL_DMA_MODULE_ENABLED
/* #define HAL_DMA2D_MODULE_ENABLED */
#define HAL_ETH_MODULE_ENABLED
#define HAL_FLASH_MODULE_ENABLED
/* #define HAL_NAND_MODULE_ENABLED */
/* #define HAL_NOR_MODULE_ENABLED */
/* #define HAL_PCCARD_MODULE_ENABLED */
#define HAL_SRAM_MODULE_ENABLED
/* #define HAL_SDRAM_MODULE_ENABLED */
/* #define HAL_HASH_MODULE_ENABLED */ 
#define HAL_GPIO_MODULE_ENABLED
#define HAL_I2C_MODULE_ENABLED
#define HAL_I2S_MODULE_ENABLED   
#define HAL_IWDG_MODULE_ENABLED
/* #define HAL_LTDC_MODULE_ENABLED */
#define HAL_PWR_MODULE_ENABLED   
#define HAL_RCC_MODULE_ENABLED
#define HAL_RNG_MODULE_ENABLED
#define HAL_RTC_MODULE_ENABLED
/* #define HAL_SAI_MODULE_ENABLED */   
/* #define HAL_SD_MODULE_ENABLED */ 
//#define HAL_SPI_MODULE_ENABLED
#define HAL_TIM_MODULE_ENABLED   
#define HAL_UART_MODULE_ENABLED
/* #define HAL_USART_MODULE_ENABLED */
/* #define HAL_IRDA_MODULE_ENABLED */
/* #define HAL_SMARTCARD_MODULE_ENABLED */
/* #define HAL_WWDG_MODULE_ENABLED */ 
#define HAL_CORTEX_MODULE_ENABLED
#define HAL_PCD_MODULE_ENABLED
/* #define HAL_HCD_MODULE_ENABLED */

and commenting-out e.g. HAL_SPI_MODULE_ENABLED makes zero difference to the binary size. So this is working as it should be.

One still ought to disable unused modules because if one needs to compile with -O0 for debugging, the code bloat could be a problem.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 11228
  • Country: us
    • Personal site
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #144 on: November 05, 2022, 09:18:46 pm »
I just want it to reliably remove entire unused functions; that removes the vast majority of the ST libraries.
Whatever settings you do in the IDE make sure that it translates to the compiler getting "-fdata-sections -ffunction-sections" passed to it and  the linker gets "--gc-sections". Those are the only conditions for things to be removed.

If you want even more optimization look at LTO. On big projects it easily saves another 10-30%.

And while you refuse to believe that, you should never compiler with -O0 for anything. Use -Og for debugging, it still includes most of the stuff from -O1 with some optimizations that affect code generation disabled.
« Last Edit: November 05, 2022, 09:21:29 pm by ataradov »
Alex
 
The following users thanked this post: SiliconWizard

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #145 on: November 05, 2022, 09:34:45 pm »
As a side note for optimizations, you can use any level of opt. including -O3 and still generate debugging info using -g separately. Of course, optimized code can be harder to debug, but if you get specific issues in optimized code, that's a way of debugging it. Then be prepared to occasionally step into some assembly.
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #146 on: November 05, 2022, 10:01:54 pm »
I don't use ST Cube IDE, but exactly like Ataradov mentioned, in a Makefile-based system you only would need
    CFLAGS := -Wall -Og -flto -fdata-sections -ffunction-sections plus whatever else you need
    LDFLAGS := -Wl,--gc-sections,--relax plus whatever else you need
assuming your implicit compilation rule is something like
    %.o: %.c
            $(CC) $(CFLAGS) -c $^
and linkage uses something like
    firmware.bin: $(expression-expanding-to-object-file-list)
            $(CC) $(CFLAGS) $^ $(LDFLAGS) -o $@

When compiling Linux applications, I use almost exactly the same, except I prefer -O2 or -Os instead of -Og.  (And like SiliconWizard said above, you can use e.g. -Os -g or -O2 -g to get debugging information.  The -Og simply selects optimizations that should not impact debuggability, whereas -Os -g optimizes for size but includes debugging information; and -O2 -g applies a lot of optimizations (it's the highest optimization level I ever use) but still includes debugging information.)
« Last Edit: November 05, 2022, 10:03:52 pm by Nominal Animal »
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #147 on: November 05, 2022, 10:05:33 pm »
Quote
And while you refuse to believe that, you should never compiler with -O0 for anything. Use -Og for debugging, it still includes most of the stuff from -O1 with some optimizations that affect code generation disabled.

Why get aggressive? What do I "refuse to believe"? I am trying to learn. This isn't some keyboard hitting contest.

As I said I normally use -Og but often a variable is shown as "optimised out", which usually looks like the variable is in a register or some such. One can work around that but -O0 avoids that.

Quote
The -Og simply selects optimizations that should not impact debuggability

That is probably true in GCC, except for the above.

Quote
-Os -g optimizes for size but includes debugging information

By "debugging information" do you mean all variables are visible?
« Last Edit: November 05, 2022, 10:17:42 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Nominal Animal

  • Super Contributor
  • ***
  • Posts: 6173
  • Country: fi
    • My home page and email address
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #148 on: November 05, 2022, 11:04:47 pm »
By "debugging information" do you mean all variables are visible?
No. "By debugging information", I mean the code is instrumented for examination using a debugger.

Useful optimization – be it for size or efficiency or other reasons – always involves removal and transformation of expressions, and that means a variable may not exist at all in the final binary.  Typical example of this is as follows:
Code: [Select]
uint_fast32_t  distance(int_fast32_t x, int_fast32_t y)
{
    int_fast64_t  n2 = (int_fast64_t)x * x + (int_fast64_t)y * y;
    return uisqrt64(n2);
}
When optimizations are enabled, and especially if uisqrt64() is a static function eligible for inlining here, there is no reason to expect n2 to be observable in the code.

While there are many ways to force the variable to exist when debugging, that is always a tradeoff between code optimization and debugging.

I personally handle this dichotomy by writing any key pieces of code separately, and testing them thoroughly; documenting the test results.  For example, when dealing with uni-variate float functions, I often test the function with all finite inputs, and compare them to the expected results calculated at at least double precision.  For multi-variate functions, I test the pathological cases, and a few billion random cases (using high bits of Xorshift64* seeded from getrandom() on Linux, or from clock_gettime()/gettimeofday() multiplying both integral and fractional seconds by large primes and XORing the result together on others).

With this approach, my typical bugs are corner cases I didn't think of, and I get a headpalm and have a fix ready in a minute.  I rarely need to use a debugger on my code.  (I can, and have, including GDB accessors and helpers written in Python, when necessary; I'm just saying that having variables be accessible to me in a debugger is not important to me.)  I do, however, quite often examine the assembly code generated, to see if the writer of the problematic function and the compiler agree as to what it should really do.
 
The following users thanked this post: peter-h

Online eutectique

  • Frequent Contributor
  • **
  • Posts: 369
  • Country: be
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #149 on: November 05, 2022, 11:07:43 pm »
By "debugging information" do you mean all variables are visible?

All variable that make it into the elf file. And not only variables, but macros as well, and perhaps more. Here is the list of debugging options with explanation.
 

Online eutectique

  • Frequent Contributor
  • **
  • Posts: 369
  • Country: be
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #150 on: November 05, 2022, 11:41:06 pm »
So, yeah, -O0 does not remove unreachable code!

Yes, it does. The record in the map file

Code: [Select]
.text.HAL_I2C_DisableListen_IT
                0x0000000000000000       0x3a ./dev/libsdk.a(stm32l4xx_hal_i2c.o)
 .text.HAL_I2C_Master_Abort_IT
                0x0000000000000000       0x84 ./dev/libsdk.a(stm32l4xx_hal_i2c.o)
 .text.HAL_I2C_EV_IRQHandler
                0x0000000000000000       0x10 ./dev/libsdk.a(stm32l4xx_hal_i2c.o)

merely tells that the linker saw the functions, their sizes are 0x3a, 0x84, and 0x10, and it does not link them into the final elf file -- the address is 0x0000000000000000.

Contrary, if a function would ever make it into the elf, the record in the map file would be:

Code: [Select]
.text.HAL_I2C_Init
                0x000000000800a4fc       0xbc ./dev/libsdk.a(stm32l4xx_hal_i2c.o)
                0x000000000800a4fc                HAL_I2C_Init
 .text.HAL_I2CEx_ConfigAnalogFilter
                0x000000000800a5b8       0x5c ./dev/libsdk.a(stm32l4xx_hal_i2c_ex.o)
                0x000000000800a5b8                HAL_I2CEx_ConfigAnalogFilter


You can always check which symbols are in the elf and their sizes with nm utility :

Code: [Select]
> arm-none-eabi-nm --print-size --size-sort censored/censored.elf | grep HAL_I2C
0800a614 00000058 T HAL_I2CEx_ConfigDigitalFilter
0800a5b8 0000005c T HAL_I2CEx_ConfigAnalogFilter
08008afc 0000006c T HAL_I2C_MspInit
0800a4fc 000000bc T HAL_I2C_Init

or to display it in decimal:

Code: [Select]
> arm-none-eabi-nm --print-size --size-sort --radix=d censored/censored.elf | grep HAL_I2C
134260244 00000088 T HAL_I2CEx_ConfigDigitalFilter
134260152 00000092 T HAL_I2CEx_ConfigAnalogFilter
134253308 00000108 T HAL_I2C_MspInit
134259964 00000188 T HAL_I2C_Init

You can not have addresses in hex and sizes in decimal.


As was already noted, LTO gives even more space savings. But behold, the vector table is likely to vanish from the elf, because it is not referenced by anything. To avoid this, add __attribute__((used)) to it.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #151 on: November 06, 2022, 12:58:49 am »
My conclusion is that -O0 includes all sources loaded into the Cube editor structure.

NEVER use -O0 (or, equivalently, no -O option at all). It is just awful. It is not just lack of optimisation, is is active pessimisation on modern ISAs.

C code compiled with -O0 runs slower than JavaScript!
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #152 on: November 06, 2022, 01:00:06 am »
C code compiled with -O0 runs slower than JavaScript!

Maybe not quite,  but it's really bad indeed!
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4003
  • Country: nz
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #153 on: November 06, 2022, 01:10:33 am »
C code compiled with -O0 runs slower than JavaScript!

Maybe not quite,  but it's really bad indeed!

No, seriously, it does, except on very short-running programs [1]. v8 and Nitro are amazing with their multiple levels of JIT and runtime code profiling.

[1] and even then v8 is always going to beat combined gcc/llvm compile time plus run time.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #154 on: November 06, 2022, 08:27:43 am »
If -O0 does not remove unreachable code, that means the level of bloat is just incredible. However, very relevantly, my product does still run then, perfectly, which just shows that this doesn't matter much in my case. Except the larger code would actually cause problems for unrelated reasons.

Yes I should have spotted that in

Code: [Select]
.text.HAL_I2C_Master_Abort_IT
                0x0000000000000000       0x84 ./dev/libsdk.a(stm32l4xx_hal_i2c.o)

the zero load address means it is not there. I thought the size of 0x84 meant it was present.

Learn something every day.

For javascript, I use freelancer.com, with wildly varying results ;)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14309
  • Country: fr
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #155 on: November 06, 2022, 07:44:41 pm »
Seriously, I too really, really advise against using -O0 for any production code. If, for any reason, your code doesn't "work" with any optimization level using industry-standard tools such as GCC with official versions, then there's definitely something wrong with it and letting it pass with bandaids (like, oh, it appears to work with -O0) doesn't sound good.

Now, the potential "bugs" that seem mitigated with -O0 may be in 3rd-party libraries that you seem to be very reliant on, and I don't blame you for not willing (/having time) to debug other people's code. But sometimes you just don't have a choice.

Now if it's just for debugging purposes, you'd do as many do, build for debugging with appropriate options, and optimized builds for releases.
But if something badly breaks once you use any kind of optimization, this is definitely not a good sign as to the robustness of the code.
 

Online peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3671
  • Country: gb
  • Doing electronics since the 1960s...
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #156 on: November 06, 2022, 09:14:16 pm »
Quote
letting it pass with bandaids (like, oh, it appears to work with -O0) doesn't sound good.

As my project stands right now, the only -O0 bits are specific functions which are in a "boot block" and have no access to stdlib, and have loops which were getting replaced with memcpy() etc. Now, I know this can be blocked (and has been with -fno-tree-loop-distribute-patterns), but I left these so I don't have to re-test them, and in case that compiler option got dropped off one day by accident.

Quote
may be in 3rd-party libraries that you seem to be very reliant on, and I don't blame you for not willing (/having time) to debug other people's code. But sometimes you just don't have a choice.

I now use very few of the bloated ST HAL functions. I've been busy either removing these or replacing them with local versions, stripped down to only the bare minimum required. The SPI code has been eliminated and replaced mostly with a single generic DMA function.

Quote
and optimized builds for releases.

That is commonly done but the amount of testing (what is now a complex product, with 2 years of my code, plus ETH, LWIP, MBEDTLS, USB CDC & MSC, FATFS, http server, https client, etc) is more than I want to do, and -Og is so damn close to anything else, that it is hardly worth the time.

And since, as they say, 99% of CPU time is spent in 1% of the code, one can always put a -O3 attribute on a specific function. But, as with assembler, the biggest speedups come not from making code faster but from doing it differently altogether. As I posted before, I once speeded up an IAR Z180 float sscanf about 1000x by specialising it for the actual input format which was always xx.yyyy (it was an HPGL to Postscript converter). I also wrote it in assembler, but that was secondary. Biggest speedups come from cunning use of hardware e.g. I have got a tracking waveform generator which runs almost no software (just DMA, timers, etc).

Some HAL code does not work with -O3, but I don't think I have any of that now. A lot of it was in the "min CS=1 time" department where if you called 2 functions in rapid succession, the time between them was too short. There is also an awful lot of stuff on github which was somebody's "work in progress on some 16MHz AVR, before he got bored" which runs at 168MHz only by luck. Especially if it involves driving chips like SPI FLASH chips. Basically there is a lot of software out there which has to be used with great caution. I spent much of yesterday digging into some 3 year old code driving a STLED316 display driver, which falls over unless you have a ~10us gap between setting the digit cursor and sending the digit data. So I made it 20us and used a dedicated timing function (written in asm) to do it. It previously worked by accident.


« Last Edit: November 06, 2022, 09:17:40 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Online newbrain

  • Super Contributor
  • ***
  • Posts: 1714
  • Country: se
Re: GCC ARM32 compiler too clever, or not clever enough?
« Reply #157 on: November 10, 2022, 08:51:53 pm »
As my project stands right now, the only -O0 bits are specific functions which are in a "boot block" and have no access to stdlib, and have loops which were getting replaced with memcpy()
Instead of renouncing optimization, did you evaluate using -no-builtins instead of -O0 for the specific function?
Or even, extracting the loops as as static inline function, with appropriate attributes.
Nandemo wa shiranai wa yo, shitteru koto dake.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf