doing math, scaled up integer or float? and pre-processor stuff

#25 Reply
Posted by Simon on 06 Sep, 2021 08:16
I'm trying to get both meaningful numbers and retain the values. The final conversion to human numbers does not need to be too accurate. What is more important is being able to write a define in mV and use that as a threshold. Once I know the scalling factor I can scale it to match. In fact I dont even need to do the last division until I want human readable numbers but that does leave every result as a 32 bit variable and bit shifting is still probably faster than division.

#26 Reply
Posted by voltsandjolts on 06 Sep, 2021 09:03
Quote from: JPortici on 06 Sep, 2021 04:17
Quote from: voltsandjolts on 05 Sep, 2021 10:19
BTW, XC8 now includes avrgcc for the avr mcus, so this is 'gcc' we are talking about.

Edit:
Microchip use gcc for XC32, XC16 and XC8-AVR
Basic optimisations are free but they charge for higher level optimisation.
All source code is available so they comply with gpl license but the build environment is obviscated to hinder re-compilation.
For XC16 XC32 there is this medicine: https://github.com/cv007/XC3216
Not sure about XC8-AVR.

XC8-PIC uses LLVM I believe, so they can lock that down.

-O2 is not enough?...

For most people I am sure -O2 is enough optimisation. But that's not my main point.
Microchip take GPL software, modify 0.01% of it, then charge money to unlock features.
They have the right to do this as long as they publish source code to meet GPL requirements, which they do.
The GPL also gives me the right to unlock these features (IMO, IANAL) either by binary modification or other means, should I choose to do so.

#27 Reply
Posted by Simon on 06 Sep, 2021 13:37
I tried to set avr-gcc up but it did not work.

#28 Reply
Posted by voltsandjolts on 06 Sep, 2021 16:24
If you are using AVR and you are using XC8, then you are already using avrgcc. It's built into "XC8" which is now a combination of two different compilers (1) the same old XC8 for PIC based on LLVM and (2) avrgcc.
Have a look here and see what you find: C:\Program Files\Microchip\xc8\

If you aren't using that, what compiler are you using?

#29 Reply
Posted by Simon on 06 Sep, 2021 18:50
Well I am using XC8, but I want to know even if it's running FCC that it is optimised and not crippled then wrapped up.

#30 Reply
Posted by voltsandjolts on 06 Sep, 2021 19:18
Optimisation settings available are here:
Project Properties > XC8 Global Options > XC8 Compiler > Option : Optimisations > Level [0,1,2 are free] [3,s require the paid compiler]

If you are happy with level 0,1,2, then all is well. TBH its probably fine.

If it makes you gnash your teeth (like it does me somewhat) then this might be a good starting point for 'fixing' it:
https://github.com/cv007/XC3216/blob/master/xc8-avr-info.txt

From what you said above, sounds like just downloading avrgcc and asking MPLABX to use that doesn't work (no surprise there, thanks microchip).

#31 Reply
Posted by Kleinstein on 06 Sep, 2021 20:34
At least with GCC there is not much use of -O3. It sometimes produces very long code by loop unrolling and this is about the main point it does extra.
The normal choice is -Os, so optimization for code size.

For performance it really help to us the fixed point math with just shifts for the divider, and do the conversion to normal units only at the end. Much of the math use of the resuslts should be with just the result from the ADC with no scaling at all. Only the user may later care about units like mV.
If you need to compare to a fixed limit (e.g. check if voltage is < 0.5 V, than do the scaling on the limit side - chances are the compiler or preprocessor would to the math and not the µC.

#32 Reply
Posted by Simon on 07 Sep, 2021 20:55
Quote from: voltsandjolts on 06 Sep, 2021 19:18

your teeth (like it does me somewhat) then this might be a good starting point for 'fixing' it:
https://github.com/cv007/XC3216/blob/master/xc8-avr-info.txt

From what you said above, sounds like just downloading avrgcc and asking MPLABX to use that doesn't work (no surprise there, thanks microchip).

Correct but if AS has been installed then it works so it's something to do with the setup and the MPLABX/Net beans settings do not tell all.

#33 Reply
Posted by voltsandjolts on 08 Sep, 2021 08:07
I wouldn't be surprised if MPLABX is somehow tied to using XC8-AVR, such that you can't configure it to use a standard download of avr-gcc.
Such is the Microchip philosophy on compilers.

Edit:
I see that Atmel Studio (that Visual Studio based IDE) has now been rebranded as Microchip Studio.
The blurb says it now supports XC8...hmm, does that mean it doesn't support avr-gcc directly?

This might be a workaround to get a clean avr-gcc running in MPLABX (haven't tried it myself)
https://www.avrfreaks.net/forum/tutsoft-how-set-mplab-x-ide-use-different-compilers-windows-os

#34 Reply
Posted by Simon on 08 Sep, 2021 13:19
Microchip have a separate download for AVR-GCC, this is what I tried. I only use MPLABX as it will be available on linux if I make the switch and Atmel/Microchip Studio only works with admin rights which would be a nightmare at work. I suspect that at some point it will vanish as surely they have to pay microsoft some sort of bulk licence.

#35 Reply
Posted by snarkysparky on 08 Sep, 2021 16:18
Old versions of Atmel Studio are available. Made before Microchip changed the compiler. Should be able to get full optimizations from them

#36 Reply
Posted by PlainName on 08 Sep, 2021 16:40
Quote
surely they have to pay microsoft some sort of bulk licence

I wouldn't put it past Microsoft to have given Microchip a backhander to get their IDE in place. Their (Microsoft's) plans would see any such license fee as not even rating 'dropped down the back of the sofa' compared to what they will reap once everyone is using VSCode and paying cloud fees to access their projects. And ongoing fees for access to user data.

They've already started down that route with Python.

#37 Reply
Posted by Simon on 08 Sep, 2021 19:29
There is really nothing of VS available in AT/MS, at one point it was a way of getting a free licence until they locked it down in later versions so I think Microsoft do rather care or they would not have stopped giving away the whole thing.

#38 Reply
Posted by Simon on 20 Sep, 2021 14:08
OK so I am trying to use floats. Now the trouble starts apparently. So i try sending 3.14 in float format to a serial port and read it on realterm.

I get 3. looking at the microchip developer help: https://microchipdeveloper.com/c:understanding-floating-point-representations
it turns out they had to do their own thing just to spice things up. Floats are 24 bit in XC8. realterm only has one option of float visualization and I think it's 32 bit as it's called float4.

So how do I make XC8 use 32 bits. is there a more user friendly serial terminal out there?

#39 Reply
Posted by T3sl4co1l on 20 Sep, 2021 14:55
What, did they not implement float printing for their custom data type? So it's just implict casting to int or whatever?

Tim

#40 Reply
Posted by nctnico on 20 Sep, 2021 15:09
Likely it needs linking a different library which has support for floating point.

#41 Reply
Posted by voltsandjolts on 20 Sep, 2021 15:30
The PICs have the option of 24 or 32 bit float, but AFAIK the AVRs only use 32bit.
The XC8 AVR manual is here.

Quote
4.3.4 Floating-Point Data Types
The MPLAB XC8 compiler supports 32-bit floating-point types. Floating point is implemented using a IEEE 754 32-bit
format. Tabulated below are the floating-point data types and their size.

#42 Reply
Posted by Simon on 20 Sep, 2021 15:35
well I am confused then, if I make a
float test1 = 3.14 ;
and send it out of the serial port I get in hex 0x00 0x00 0x00 0x03

#43 Reply
Posted by voltsandjolts on 20 Sep, 2021 15:36
Sound like it's being cast to an integer. Post your transmit code.

#44 Reply
Posted by Simon on 20 Sep, 2021 17:52
I don't have it to hand now but uh, yea, you are right.... I wrote a function that goes something like

USART_32bit_send(uint32_t data_dataword){

usart_transmitt_buffer = data_dataword >> 24;
usart_transmitt_buffer = data_dataword >> 16;
usart_transmitt_buffer = data_dataword >> 8;
usart_transmitt_buffer = data_dataword ;
}

I've actually given up on the idea of floats here. Given that I am anyway pushed into 32 bit variables and 32 bit multiplications seem to happen quickly enough I am keeping everything in a 32 bit space. I'm not doing my own fixed point representation either. Given that 32 bits have to hold the full calculation of:

real_value = (ADC_result * ADC_ref * itput_voltage_scaling) / ADC_top_count ;

and I am working with thresholds so the actual value is not too important so long as I can write my inputs "in english".

So:
real_value = (ADC_result * ADC_ref_mV * itput_voltage_scaling) ;
and
Threshold_mV = ADC_ref_mV * ADC_top_count ;

So I change the division for multiplications that are going to be optimised out anyway.

#45 Reply
Posted by Jan Audio on 22 Sep, 2021 14:46
Just remember float is not accurate.
You getting integer from the ADC so you better keep it integer math.

#46 Reply
Posted by Kleinstein on 22 Sep, 2021 17:03
Quote from: Simon on 20 Sep, 2021 17:52
I don't have it to hand now but uh, yea, you are right.... I wrote a function that goes something like

USART_32bit_send(uint32_t data_dataword){

}

This gives you the case to an inut32 in the function call. To get the binaly data one would need to use something like a pointer, memcopy or a union.

Using integrers is likely easier, espeically if the µC has no FPU.

The usual way is to avoid magic numbers in the code itself and define the thresholds as constants at the top. The constant would than be in ADC counts or maybe multiples (like a factor of 256) if soemthing like averaging is used for the ADC. The math would be in the definietion of the constant, like
# define adc_scale_factor (max_count / ref_mV) // e.g. 4096 counts / 2500 mV
#define limt 1200 * adc_scale_factor // for a 1200 mV limit

Essentially all the math would be done by the compiler / preprocessor

#47 Reply
Posted by SiliconWizard on 22 Sep, 2021 17:28
Second the integer operations-only approach, of course. Especially on such a limited device as what the OP uses. Just including software FP support will take a significant fraction of the available code memory anyway. In particular if you not only use FP operations, but also FP support in the xxprintf() series of functions.

#48 Reply
Posted by T3sl4co1l on 22 Sep, 2021 17:32
FYI, last time I used float on AVR (GCC standard 32 bit floats) it only added about 2kB, surprisingly slim. I forget which all operations that used; multiply, add and convert from/to int probably, maybe divide?

Tim

#49 Reply
Posted by Nominal Animal on 22 Sep, 2021 17:48
Let's recap. (I promise, this will be useful!)

A linear mapping from X0..X1 to Y0..Y1 is
y = Y0 + (x - X0) × (Y1 - Y0) / (X1 - X0)
and its inverse is
x = X0 + (y - Y0) × (X1 - X0) / (Y1 - Y0)

Let's assume we limit x to X0..X1, inclusive, and y to Y0..Y1, inclusive; and that X0 < X1 and Y0 < Y1. Then, we need one unsigned integer multiplication and one unsigned integer division with an unsigned integer type capable of describing values from zero to (Y1-Y0)×(X1-X0), inclusive. (The final or initial addition is in the type of x or y.)

Let's consider a situation where x represents a 12-bit ADC reading (unsigned integer counts between 0 and 4095) and y represents voltage in tens of microvolts (unsigned integer count between 0 and 499878, inclusive, with 500000 = 5.0V). In other words, X0=0, X1=4096, Y0=0, Y1=500000. The product (Y1-Y0)×(X1-X0)=2048000000 < 2³¹, so we can use 32-bit integer math here.
Code: [Select]
static inline uint32_t adc_to_internal_voltage_units(const uint16_t adc) { return ((uint32_t)adc * UINT32_C(500000)) / 4096; } static inline uint32_t internal_voltage_units_to_adc(const uint32_t units) { return units * 4096 / UINT32_C(500000); }
Because the next power of ten of 500000 is one million, which is less than 2²⁰, we only need 20-bit unsigned integer multiplication by ten, and a 20-bit unsigned integer comparison to and subtraction by 100000, to convert the internal voltage units to a string. We can do the same thing by converting the internal units as an unsigned integer, then put a decimal point between the fifth and the sixth digit, too. An example implementation:
Code: [Select]
static inline char *internal_voltage_units_to_string(uint32_t units) { static char buffer[7]; /* V.vvvv plus end-of-string terminator. */ for (char i = 1; i < 6; i++) { char digit = '0'; while (units >= 100000) { units -= 100000; digit++; } buffer[i] = digit; if (i < 5) units *= 10; } buffer[0] = buffer[1]; /* Move units digit left of decimal point */ buffer[1] = '.'; /* Decimal point */ buffer[6] = '\0'; /* End-of-string mark */ return (char *)buffer; }Note that 10000 internal units is 1.0 V.

Compiling the above functions (removing static inline) on AVR-GCC 5.4.0 with -O2 for ATmega32U4 generates 30, 32, and 150-byte functions, respectively; using -Os shrinks the last one to 90 bytes. The first function uses __muluhisi3 for 32-bit unsigned integer multiplication, the second uses __udivmodsi4 for 32-bit unsigned integer division, and the third is self-contained using -O2. The third uses __muluhisi3 if using -Os; the difference between -O2 and -Os here is whether the multiply-by-ten is implemented via a call to __muluhisi3 or inlined as bit shifts and additions (the difference being about 60 bytes of code).

This same logic can be expanded into any internal units. If the string form is needed, it is useful to have the unsigned integer units have the same digits (i.e., some power of ten, positive or negative, of the human-useful value), because that makes the string conversion easy and cheap. Otherwise, a second linear mapping is needed (to convert the internal units to a decimal representation); which itself is not that costly, code-wise, as you can see from above.

In particular, the fully variable (run-time calibratable) form,
Code: [Select]
#include <stdint.h> extern const int16_t y_0; extern const uint16_t y_delta; /* ydelta = y1 - y0, ydelta > 0 */ extern const int32_t x_0; extern const uint32_t x_delta; /* xdelta = x1 - x0, xdelta > 0 */ typedef uint64_t xy_type; int32_t x(const int16_t y) { return x_0 + ((xy_type)(y - y_0) * x_delta) / y_delta; } int16_t y(const int32_t x) { return y_0 + ((xy_type)(x - x_0) * y_delta) / x_delta; }generates 140-byte and 160-byte functions using the same settings as above (for both -O2 and -Os). x() contains one call to __muldi3 and one call to __udivdi3, and y() one call to __mulsidi3 and one call to __udivdi3, so I would not consider them "slow". Note that the xytype is an unsigned integer type that can describe values from 0 to xdelta*ydelta, inclusive; here, a 48-bit unsigned type would suffice. With x_delta=500000 and y_delta=4096, uint32_t would suffice; that would shrink the functions a bit and use faster calls.