Author Topic: STM32, ghetto style (Read 149464 times)

dannyf · « **Reply #200 on:** September 28, 2014, 11:00:13 am »

Quote

these libraries bloat the resulting binary way too much.

It can be a problem, more so on the lower-end.

But it doesn't have to: that's where a middle layer offers a great compromise - if it turns out that something is taking up too much space / time due to library uses, you can hand-code it without changing the user code.

I have a filter routine that takes an input and generates a filtered output. Depending on the libraries I linked in, the underlying algorithm could be a simple LPF, or PID process, or a set of code generated by Matlab.

I tend to think that unless you are in a cost conscious situation (ie. your value-add is more in hardware and less in software), you worry about other factors, like speed to market, or maintainability, etc.

gnif · « **Reply #201 on:** September 28, 2014, 11:44:15 am »

Hi,

Have not fully read this post through, lots and lots of pages... so if this has already been mentioned then ignore this post.

The standard perhipheral library is a load of crap! It is extremely bloated and performs things an exteremely backwards way. I have been programming on the STM32 since they were originally released, and I am also the original author of the open source stm32flash project for serial flashing.

There are three alternatives to using the std perhipheral lib..

1) Read the docs and write to the registers manually (fastest and smallest code results). Once you get familiar with the registers this usually ends up being the best option.

2) Use libopencm3 (formally libopenstm32), very well written, very easy to use with well written examples and a lot of support.

3) Got space for C++ in your project, use libstm32pp... although this project seems to be rather dated with little support.

Personally I opt for option 2 when hacking something together, but when I need raw power, or smaller code, I will configure the registers myself.

Example blinky program

Code: [Select]

#include <libopencm3/stm32/rcc.h>
#include <libopencm3/stm32/gpio.h>

int main(void)
{
  rcc_clock_setup_hsi(&hsi_8mhz[CLOCK_44MHZ]);
  rcc_periph_clock_enable(RCC_GPIOE);
  gpio_mode_setup(GPIOE, GPIO_MODE_OUTPUT, GPIO_PUPD_NONE, GPIO12);
  while(1)
  {
    gpio_toggle(GPIOE, GPIO12);
    for (i = 0; i < 2000000; i++)
      __asm__("nop");
  }
}

A bit shorter no?

Koepi · « **Reply #202 on:** September 28, 2014, 12:47:24 pm »

gnif, your second option sounds great, will take a look at it for sure! Have to see which µCs are supported by it. When the resulting code is faster and more slim than SPL code it's the way to go. As reading through bad documented "pure register work" code is a nightmare.

Oh, and thank you for stm32flash, using it daily:

Code: [Select]

MacBook-Air:Release koepi$ ./flash.sh
stm32flash 0.3

http://stm32flash.googlecode.com/

Using Parser : Intel HEX
Serial Config: 57600 8E1
Version      : 0x31
Option 1     : 0x00
Option 2     : 0x00
Device ID    : 0x0444 (STM32F030/F031)
- RAM        : 8KiB  (4096b reserved by bootloader)
- Flash      : 64KiB (sector size: 4x1024)
- Option RAM : 12b
- System RAM : 3KiB

Wrote and verified address 0x08001570 (100.00%) Done.

Just another F030 test, a µC controlled LiIon charger. Real life useable. Based upon a TP4056 chip and a switchable resistor and an added power regulator for the µC power supply with 3.3Volts. Due to lack of motivation of soldering up a 4-digit-7-segment with shift registers, output is via UART for now.

Code: [Select]

Charging with 610 mA. (Vprog 586mV, 36%.)
Charging with 612 mA. (Vprog 587mV, 36%.)
Charging with 610 mA. (Vprog 586mV, 36%.)
Charging with 609 mA. (Vprog 585mV, 36%.)
Charging with 608 mA. (Vprog 584mV, 36%.)
Charging with 609 mA. (Vprog 585mV, 36%.)

The code is very simplistic (and SPL based

):

Code: [Select]

#include "stm32f0xx.h"
#include <stdbool.h>
#include <stdio.h>
#include "main.h"

static char text[255];
volatile uint32_t VREFINT_CAL = 0;
RCC_ClocksTypeDef Clocks;
volatile uint8_t charge_rate=0;
static const int resistance[3] = { 1150, 2300, 4400 };
static const bool transistor1[3] = { 0, 1, 0 };
static const bool transistor2[3] = { 1 ,0 ,0 };


void init(void) {
	SystemInit();  // Initialize the System

	// ---- Setup PLL for 48 MHz :) ----
	RCC_DeInit();
	RCC_PLLCmd(DISABLE);
	RCC_PLLConfig(RCC_PLLSource_HSI, RCC_PLLMul_12);
	// Flash: 1 WaitState for 24MHz < SysCLK < 48 MHz
	FLASH_SetLatency(FLASH_Latency_1);
	FLASH_PrefetchBufferCmd(ENABLE);
	// Set ADC clock to sync PCLK/4->12MHz
	RCC_ADCCLKConfig(RCC_ADCCLK_HSI14);
	// and turn the PLL back on again
	RCC_PLLCmd(ENABLE);
	// set PLL as system clock source
	RCC_SYSCLKConfig(RCC_SYSCLKSource_PLLCLK);
	// ---- End of Setup PLL for 48 MHz :) ----


	GPIO_InitTypeDef GPIO_InitStructure;
	USART_InitTypeDef USART_InitStruct;
	ADC_InitTypeDef  ADC_InitStructure;

	RCC_APB2PeriphClockCmd(RCC_APB2Periph_USART1, ENABLE);
	RCC_AHBPeriphClockCmd(RCC_AHBPeriph_GPIOA, ENABLE);
	RCC_APB2PeriphClockCmd(RCC_APB2Periph_ADC1, ENABLE);

	// ADC-init for Temperature probe
	ADC_DeInit(ADC1);
	ADC_InitStructure.ADC_Resolution = ADC_Resolution_12b;
	ADC_InitStructure.ADC_ContinuousConvMode = DISABLE;  // on demand
	ADC_InitStructure.ADC_ExternalTrigConv = ADC_ExternalTrigConv_T1_TRGO;
	ADC_InitStructure.ADC_ExternalTrigConvEdge = ADC_ExternalTrigConvEdge_None;
	ADC_InitStructure.ADC_DataAlign = ADC_DataAlign_Right;  // 12 bit right aligned
	ADC_InitStructure.ADC_ScanDirection = ADC_ScanDirection_Upward;
	ADC_Init(ADC1, &ADC_InitStructure);

	// ADC calibration; but not used as returned value ends nowhere now...
	VREFINT_CAL = ADC_GetCalibrationFactor(ADC1);
	ADC_Cmd(ADC1, ENABLE);
	ADC_DiscModeCmd(ADC1, ENABLE);

	// Resistor switches (PNP FET) on PA0+1
	GPIO_InitStructure.GPIO_Pin = GPIO_Pin_0 | GPIO_Pin_1;
	GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;
	GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_NOPULL;
	GPIO_Init(GPIOA, &GPIO_InitStructure);

	// R-Probe on PA2
	GPIO_InitStructure.GPIO_Pin = GPIO_Pin_2;
	GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AN;
	GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_DOWN;
	GPIO_Init(GPIOA, &GPIO_InitStructure);

	// UART1: Configure PA9 and PA10
	GPIO_InitStructure.GPIO_Pin = GPIO_Pin_9 | GPIO_Pin_10;
	GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF;
	GPIO_InitStructure.GPIO_OType = GPIO_OType_PP;
	GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
	GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_NOPULL;
	GPIO_Init(GPIOA, &GPIO_InitStructure);

	GPIO_PinAFConfig(GPIOA, GPIO_PinSource9, GPIO_AF_1);
	GPIO_PinAFConfig(GPIOA, GPIO_PinSource10, GPIO_AF_1);

	USART_InitStruct.USART_BaudRate = 57600;
	USART_InitStruct.USART_WordLength = USART_WordLength_8b;
	USART_InitStruct.USART_StopBits = USART_StopBits_1;
	USART_InitStruct.USART_Parity = USART_Parity_No;
	USART_InitStruct.USART_HardwareFlowControl = USART_HardwareFlowControl_None;
	USART_InitStruct.USART_Mode = USART_Mode_Tx | USART_Mode_Rx;
	USART_Init(USART1, &USART_InitStruct);
	USART_Cmd(USART1, ENABLE);

	RCC_GetClocksFreq(&Clocks);
}

int main (void) {
	init();

	sprintf(text, "STM32F030-Charger startup.\r\nHCLK is %ld MHz, PCLK %ld MHz, System %ld MHz, ADC %ld MHz.\r\n", (Clocks.HCLK_Frequency/1000000), (Clocks.PCLK_Frequency/1000000), (Clocks.SYSCLK_Frequency/1000000), (Clocks.ADCCLK_Frequency/1000000));
	uart_tx(text);

	charge_rate = 0;
	if (transistor1[charge_rate]) GPIOA->BSRR = (GPIO_Pin_0);  // Enable FET 1 (2.300 Ohm)
	if (transistor2[charge_rate]) GPIOA->BSRR = (GPIO_Pin_1);  // Enable FET 2 (1.150 Ohm)

	while (1) {
		int stage = read_adc();
		int Vmax = (uint32_t)(2000*3280/(1<<12));
		int Vprog = (uint32_t)(stage*3280/(1<<12));
		int Ibat = (uint32_t) ((1000*Vprog/resistance[charge_rate]*1200) / 1000);
		int percent = (uint32_t)(Vprog * 100/Vmax);
		sprintf(text, "Charging with %d mA. (Vprog %dmV, %d%%.)\r\n", Ibat, Vprog, percent);
		uart_tx(text);
		Delay(10000000);
	}
}

int read_adc(void) {
	uint32_t temp = 0;
	ADC_ChannelConfig(ADC1, ADC_Channel_2, ADC_SampleTime_13_5Cycles);

	// throw away first result like on AVR? first value seems wrong.
	ADC_StartOfConversion(ADC1);
	while(ADC_GetFlagStatus(ADC1, ADC_FLAG_EOC) == RESET);

	for (uint8_t i = 0; i<16; i++) {  // resampling
		ADC_StartOfConversion(ADC1);
		while(ADC_GetFlagStatus(ADC1, ADC_FLAG_EOC) == RESET);
		temp += ADC_GetConversionValue(ADC1);
	}
	return ( (temp>>4) );
}

void uart_tx(char *out) {
	while (*out) {
		while(!(USART1->ISR & USART_FLAG_TXE)) {
			// Nothing, really.;
		}
		USART1->TDR = *out++;
	}
}


void Delay(__IO uint32_t nCount) {
  while(nCount--) {
  }
}

dannyf · « **Reply #203 on:** September 28, 2014, 01:11:14 pm »

Quote

The standard perhipheral library is a load of crap! It is extremely bloated and performs things an exteremely backwards way.

Quote

2) Use libopencm3 (formally libopenstm32), very well written, very easy to use with well written examples and a lot of support.

What's crap and what's not is quite subjective - not right or wrong.

Foundamentally, the use of libraries is a trade-off between many factors, speed / space and ease of use, documentation, reliability, .... There is no right or wrong answer here. Part of a designer's job is to assess the situation and figure out which way is better in a given application.

Quote

but when I need raw power, or smaller code, I will configure the registers myself.

With that, why don't you implement the following:

1) boot up the chip in hsixpll, construct a random number by reading the lsb of an adc pin and transmit that via usart interrupt.
2) implement that using your direct register access + libopencm3.

We will see how much faster your code runs, how much smaller it is vs. the St library implementation. Then we can see if it is worth it.

Quote

A bit shorter no?

If you allow the use of a library, I can make it into one line,

.

gnif · « **Reply #204 on:** September 28, 2014, 04:32:52 pm »

Quote from: dannyf on September 28, 2014, 01:11:14 pm

Quote
The standard perhipheral library is a load of crap! It is extremely bloated and performs things an exteremely backwards way.

Quote
2) Use libopencm3 (formally libopenstm32), very well written, very easy to use with well written examples and a lot of support.

What's crap and what's not is quite subjective - not right or wrong.

You are correct it is subjective, but obviously I am not alone in my opinion as libopencm3 has quite a lot of developers backing and using it, many people, including myself, moved from the std lib because we found it sucked... I started to write my own library before I found libopencm3.

Quote from: dannyf on September 28, 2014, 01:11:14 pm

Foundamentally, the use of libraries is a trade-off between many factors, speed / space and ease of use, documentation, reliability, .... There is no right or wrong answer here. Part of a designer's job is to assess the situation and figure out which way is better in a given application.

Quote
but when I need raw power, or smaller code, I will configure the registers myself.

With that, why don't you implement the following:

1) boot up the chip in hsixpll, construct a random number by reading the lsb of an adc pin and transmit that via usart interrupt.
2) implement that using your direct register access + libopencm3.

We will see how much faster your code runs, how much smaller it is vs. the St library implementation. Then we can see if it is worth it.

Well, obviously it is only if I am trying to squeeze every ounce of performance out of it... in most cases I dont bother, as I stated, it is rare. If you want to go that far you could always go to writing assembler, sometimes it is warranted, but rare.

Quote from: dannyf on September 28, 2014, 01:11:14 pm

Quote
A bit shorter no?

If you allow the use of a library, I can make it into one line, .

Well that is obvious... I was allowing the use of a library, what do you think the 'standard pheriperal library' is?

dannyf · « **Reply #205 on:** September 28, 2014, 04:45:57 pm »

Quote

libopencm3 has quite a lot of developers backing and using it,

If you use popularity as a judgment of quality, I wouldn't be surprised if far more people use st's library than libopencm3.

Quote

many people, including myself, moved from the std lib because we found it sucked...

Then take the test I mentioned earlier so you can prove conclusively that libopencm3 produces faster and smaller pieces of code.

Quote

as I stated, it is rare.

So the st libraries are a significant problem that you rarely run into?

I wish I had those problems,

dannyf · « **Reply #206 on:** September 28, 2014, 04:48:22 pm »

On sucky libraries that take up too much space, maybe you should check out those luminary / TI chips with in-rom libraries.

Zero (flash) space taken.

Koepi · « **Reply #207 on:** September 29, 2014, 06:04:23 am »

I got bored yesterday evening, so I built a 4-digit-7-segment and wrote PoC code for the tiny F030.
It's incredibly smooth to control the 7-segment with a fast STMf030 completely in software, in direct comparison to ATtiny/ATmega. The multiplex is working nice and satisfying.

The most amazing thing is that the power supplied via the F030 suffices!

Code: [Select]

#include "stm32f0xx.h"
#include <stdbool.h>
#include <stdio.h>
#include "main.h"

static char text[255];
RCC_ClocksTypeDef Clocks;
volatile uint8_t digit[4];

// 'database' of digits
static uint8_t lightup[10]={
		0b00000011,  // 0
		0b10011111,  // 1
		0b00100101,  // 2
		0b00001101,  // 3
		0b10011001,  // 4
		0b01001001,  // 5
		0b01000001,  // 6
		0b00011111,  // 7
		0b00000001,  // 8
		0b00001001 };  // 9

void init(void) {
	SystemInit();  // Initialize the System

	// ---- Setup PLL for 48 MHz :) ----
	RCC_DeInit();
	RCC_PLLCmd(DISABLE);
	RCC_PLLConfig(RCC_PLLSource_HSI, RCC_PLLMul_12);
	// Flash: 1 WaitState for 24MHz < SysCLK < 48 MHz
	FLASH_SetLatency(FLASH_Latency_1);
	FLASH_PrefetchBufferCmd(ENABLE);
	// Set ADC clock to sync PCLK/4->12MHz
	RCC_ADCCLKConfig(RCC_ADCCLK_HSI14);
	// and turn the PLL back on again
	RCC_PLLCmd(ENABLE);
	// set PLL as system clock source
	RCC_SYSCLKConfig(RCC_SYSCLKSource_PLLCLK);
	// ---- End of Setup PLL for 48 MHz :) ----


	GPIO_InitTypeDef GPIO_InitStructure;
	USART_InitTypeDef USART_InitStruct;

	RCC_APB2PeriphClockCmd(RCC_APB2Periph_USART1, ENABLE);
	RCC_AHBPeriphClockCmd(RCC_AHBPeriph_GPIOA, ENABLE);

	// 7-seg on PA0-PA4
	GPIO_InitStructure.GPIO_Pin = GPIO_Pin_0 | GPIO_Pin_1 | GPIO_Pin_2 | GPIO_Pin_3 | GPIO_Pin_4;
	GPIO_InitStructure.GPIO_Mode = GPIO_Mode_OUT;
	GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_NOPULL;
	GPIO_Init(GPIOA, &GPIO_InitStructure);


	// UART1: Configure PA9 and PA10
	GPIO_InitStructure.GPIO_Pin = GPIO_Pin_9 | GPIO_Pin_10;
	GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF;
	GPIO_InitStructure.GPIO_OType = GPIO_OType_PP;
	GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
	GPIO_InitStructure.GPIO_PuPd = GPIO_PuPd_NOPULL;
	GPIO_Init(GPIOA, &GPIO_InitStructure);

	GPIO_PinAFConfig(GPIOA, GPIO_PinSource9, GPIO_AF_1);
	GPIO_PinAFConfig(GPIOA, GPIO_PinSource10, GPIO_AF_1);

	USART_InitStruct.USART_BaudRate = 57600;
	USART_InitStruct.USART_WordLength = USART_WordLength_8b;
	USART_InitStruct.USART_StopBits = USART_StopBits_1;
	USART_InitStruct.USART_Parity = USART_Parity_No;
	USART_InitStruct.USART_HardwareFlowControl = USART_HardwareFlowControl_None;
	USART_InitStruct.USART_Mode = USART_Mode_Tx | USART_Mode_Rx;
	USART_Init(USART1, &USART_InitStruct);
	USART_Cmd(USART1, ENABLE);

	RCC_GetClocksFreq(&Clocks);

	// switch 7-seg on
	GPIOA->BSRR = GPIO_Pin_0;  // Vcc on PA0
	GPIOA->BRR = GPIO_Pin_1;  // GND on PA1
	// PA2: SHCP, PA3: STCP; PA4: DS
}

int main (void) {
	init();

	sprintf(text, "STM32F030-7Seg startup.\r\nHCLK is %ld MHz, PCLK %ld MHz, System %ld MHz, ADC %ld MHz.\r\n", (Clocks.HCLK_Frequency/1000000), (Clocks.PCLK_Frequency/1000000), (Clocks.SYSCLK_Frequency/1000000), (Clocks.ADCCLK_Frequency/1000000));
	uart_tx(text);


	while (1) {
		for (int i=0; i<10000; i++) {
			int temp = i;
			for (int j=3; j>=0; j--) {
				digit[3-j] = temp % 10;
				temp /= 10;
			}
			for (int k=0; k<2000; k++) {
				show_digit();
			}
		}
	}
}

void show_digit (void) {
	int temp = 0;
	for (int i=0; i<4; i++) {
		temp = (((lightup[digit[i]]))<<8) | (1<<(i+4));
		push_out(temp);
	}
}

void push_out(int output) {
	for (int i=0; i<16; i++) {
		if (output & (1<<i)) {
			GPIOA->BSRR = GPIO_Pin_4;
		} else {
			GPIOA->BRR = GPIO_Pin_4;
		}
		GPIOA->BSRR = GPIO_Pin_2;
		GPIOA->BRR = GPIO_Pin_2;
	}
	GPIOA->BSRR = GPIO_Pin_3;
	GPIOA->BRR = GPIO_Pin_3;
	Delay(600);
}

void uart_tx(char *out) {
	while (*out) {
		while(!(USART1->ISR & USART_FLAG_TXE)) {
			// Nothing, really.;
		}
		USART1->TDR = *out++;
	}
}


void Delay(__IO uint32_t nCount) {
  while(nCount--) {
  }
}

dannyf · « **Reply #208 on:** September 29, 2014, 10:00:33 am »

Quote

I built a 4-digit-7-segment and wrote PoC code for the tiny F030.

Nice work.

A couple suggestions:

1) you may want to implement the display in a timer isr - it is periodically called and entirely in the background so you can do whatever in your main loop without the display flickering.
2) I would use a display buffer to hold the segment information - the display routine would read from that buffer which segments to turn on / off. With that, you can display alphanumeric information on the led.

I would also package the code in an easy-to-use module that's both CA and CC capable.

Koepi · « **Reply #209 on:** September 29, 2014, 01:46:41 pm »

dannyf, thanks for the inspiration - yes, that's on the todo list, as well as starting with the built-in rtc. Let's see where it takes us with the internal 40kHz oscillator.

Just as a note, I called it PoC (proof-of-concept), because it's a fast prototype to make it work on the lowest possible level, as fast as possible. Adding stuff like timers and so on just adds possible errors at this stage. So that is just something to get the idea / working principle from and then throw that all away and do it right. Kind of rapid prototyping.

dannyf · « **Reply #210 on:** September 29, 2014, 04:54:01 pm »

It is true: more complexity means more chances that things can go wrong.

however, it can be managed. My timer routines, for example, involve two functions - plus one isr. tmrx_init() sets up timerx to trigger periodically. tmrx_isr() installs a user isr.

in your case, toy can set up the timer and install led display as the isr. And you are done - you can do anything in main and your display will be updated automatically.

Something like this:

Code: [Select]

#include "tmr14.h"  //use tmr14

void show_digit(void); //update display

int main(void) {
  ...
  //set up tmr14 to update led display periodically
  tmr14_init(TMR_PS_100x, TMR_1ms); //tmr14 initialized to 1ms interval, 100x prescaler => 1 overflow every 100ms
  tmr14_isr(show_digit); //install show_digit() as user isr
  //done

if it helps, I can provide a template to get you started.

edit: fixed some spelling and added an example.

dannyf · « **Reply #211 on:** September 30, 2014, 12:35:59 am »

Quote

this is STM32L152

Not ghetto but in case anyone struggles with this, the particular board here is STM32L152RE, used on the Nucleo board.

The chip is actually very new, introduced this March / May. Many IDEs, including Keil uVision and CoIDE, don't have support for it. However, you can simply pick STM32L152RD (and other variety if you wish), and everything works out just fine.

westfw · « **Reply #212 on:** September 30, 2014, 08:03:14 am »

Quote from: gnif on September 28, 2014, 11:44:15 am

The standard perhipheral library is a load of crap! It is extremely bloated and performs things an exteremely backwards way.

You might enjoy "The other ST thread" where I've also been complaining about the STP, and positing some theories that the way it has been deployed is not the way that the original authors had in mind. In a simple example, I cut size of generated code by about 40% by rearranging the STP source to be "inline" rather than in a separate library. (starting about here: https://www.eevblog.com/forum/microcontrollers/one-dollar-one-minute-arm-development/msg516710/#msg516710 )

(and I'll also add a thankyou for stm32flash!)

Koepi · « **Reply #213 on:** September 30, 2014, 08:45:55 am »

Well, thanks for the advertisement for that thread, but I see it's from paulie and he's trolling around a lot there. So no joy in that. I dislike to read dumb provocation from a teenager to make others take over his work for his project. May have worked in kindergarden, but will be a show stopper in real life.

Thanks for the inlining hint. It seems like the actual "how do I do the inlining" is still missing, but anyhow. A buzzword more which may help when looking around for binary size reduction.

dannyf · « **Reply #214 on:** September 30, 2014, 11:02:49 am »

I would be careful with inlining.

All it does is to trade overhead associated with calling a function with flash space. So you use inlining mostly in two cases:

1) speed is important. This is more pronounced when the function to be inlined is small and frequently called. Inlining for example a fft or ieee1057 would make very little sense.

2) overhead is significant vs. the code's execution: like checking a flag, etc. The st library has quite a few such cases. The ghetto code posted earlier took this approach: rather than using a function that checks the flag, it went directly to the register and checked the flag there -> no overhead.

If you are concerned about space, you generally don't inline the code; If you are concerned about execution time, you generally inline the code.

Koepi · « **Reply #215 on:** September 30, 2014, 11:39:08 am »

That's how I know inlining, too. More like a #define or Macro, to insert a function completely at that space and reduce jumps, a bit like loop unrolling (which obviously is not reducing the code size).

Thus I'm interested to know how inlining is done to reduce codesize, not to bloat it. I miss that explanation on the other thread, I just read that functions are repeated in several classes of the SPL and thus some optimizations in that area would be possible.

mikerj · « **Reply #216 on:** September 30, 2014, 11:57:32 am »

Quote from: Koepi on September 30, 2014, 11:39:08 am

That's how I know inlining, too. More like a #define or Macro, to insert a function completely at that space and reduce jumps, a bit like loop unrolling (which obviously is not reducing the code size).

Thus I'm interested to know how inlining is done to reduce codesize, not to bloat it. I miss that explanation on the other thread, I just read that functions are repeated in several classes of the SPL and thus some optimizations in that area would be possible.

For very small functions (e.g. that can be implemented in one or two instructions) that are called from many different locations, the overhead from the call and return instructions can add significantly to the memory overhead. In this specific case inlining can reduce memory requirements.

dannyf · « **Reply #217 on:** September 30, 2014, 12:57:35 pm »

Quote

functions are repeated in several classes of the SPL

If so, inlining those functions would likely increase the code size.

I will look into it more later.

Koepi · « **Reply #218 on:** September 30, 2014, 01:05:12 pm »

Probably "inlining" is used wrong in that context and means to "streamline" multiple function-instances from different classes to collect them in an extra class - so they appear only once and can be reused everywhere. Just guessing.

(And so small functions which would result in 1-2 machine instructions would more likely be macros, won't they? I can't from top of my head think of having ever seen something like that as function

)

mikerj · « **Reply #219 on:** September 30, 2014, 04:02:50 pm »

Quote from: Koepi on September 30, 2014, 01:05:12 pm

(And so small functions which would result in 1-2 machine instructions would more likely be macros, won't they? I can't from top of my head think of having ever seen something like that as function )

Small functions like this are used extensively in things like the CMSIS and STM32 libraries in order to provide a level of abstraction. e.g.

Code: [Select]

void GPIO_SetBits(GPIO_TypeDef* GPIOx, uint16_t GPIO_Pin)
{
  /* Check the parameters */
  assert_param(IS_GPIO_ALL_PERIPH(GPIOx));
  assert_param(IS_GPIO_PIN(GPIO_Pin));
  
  GPIOx->BSRR = GPIO_Pin;
}

The asserts don't create any code at all unless you define a symbol to enable them, so this function requires two values to be pushed onto the stack prior to calling, the call itself, and then the two stack values used to write a value to the GPIO set register, and then the return back to the calling function. Inlining a function like this would very likely save some code space.

dannyf · « **Reply #220 on:** September 30, 2014, 05:41:02 pm »

I went back to that thread where inlining cuts the code by 40%. I didn't see anything that supports it - it seems that it was talking about a convoluted way of calculating baud rates.

Maybe my browser was acting up but that's what I got.

westfw · « **Reply #221 on:** September 30, 2014, 06:41:01 pm »

Quote

I went back to that thread where inlining cuts the code by 40%. I didn't see anything that supports it

The numbers are buried in "code" tags in this post:
https://www.eevblog.com/forum/microcontrollers/one-dollar-one-minute-arm-development/msg518377/#msg518377
"USART_DeInit(); USART_Init()" pair: 596 bytes without inlining, 356 bytes with inlining. 40.3% savings. (67% bigger without inlining!)

The detailed explanation, and the sample code that you could download and see for yourself, is in this post: https://www.eevblog.com/forum/microcontrollers/one-dollar-one-minute-arm-development/msg518579/#msg518579

The complaint about the baud rate calculation was elsewhere.

westfw · « **Reply #222 on:** September 30, 2014, 08:06:37 pm »

You haven't actually inlined anything if you're only including the .h in your main.c...

westfw · « **Reply #223 on:** October 01, 2014, 05:51:06 am »

It turns out that there is a simpler way to get partly optimized (inlined) STP code. Simply include the .c file from the STP modules as well as the .h files:

Code: [Select]

#include "stm32f0xx_rcc.h"		//clock routing used
#include "stm32f0xx_gpio.h"		//gpio modules used
#include "stm32f0xx_adc.h"		//adc module used
#include "stm32f0xx_usart.h"		//usart used
#include "stm32f0xx_misc.h"								//interrupts used

#ifdef DOINLINE
#include "stm32f0xx_rcc.c"		//clock routing used
#include "stm32f0xx_gpio.c"		//gpio modules used
#include "stm32f0xx_adc.c"		//adc module used
#include "stm32f0xx_usart.c"		//usart used
#endif

Doing this for Dannyf's init/adc/uart blinky program shaved off a bit more than 10%, which isn't bad for an essentially zero-effort change. It didn't improve any using modified STP files, either. (However, the simple uart deinit/init example, which was crafted to explicitly capitalize on the inline definitions, DID do much better with modified STP source - it only went down about 5% in size with the double-includes. I'm not sure exactly why...)

Code: [Select]

BillW-MacOSX-2<10307> make dannyn
/usr/local/armgcc/bin/arm-none-eabi-gcc -DSTM32F10X_MD=1 -mcpu=cortex-m3 -mthumb -O3 -ffunction-sections -fdata-sections -Wl,--gc-sections -gdwarf-2 -include assert.h -o danny_normal -DSTM32F030=1 \
           -I normal danny.c \
           normal/stm32f0xx_misc.c normal/system_stm32f0xx.c \
           normal/stm32f0xx_usart.c normal/stm32f0xx_rcc.c \
           normal/stm32f0xx_adc.c normal/stm32f0xx_gpio.c
/usr/local/armgcc/bin/arm-none-eabi-size danny_normal
   text    data     bss     dec     hex filename
   3288    1120      40    4448    1160 danny_normal
BillW-MacOSX-2<10308> make dannyi2
/usr/local/armgcc/bin/arm-none-eabi-gcc -DSTM32F10X_MD=1 -mcpu=cortex-m3 -mthumb -O3 -ffunction-sections -fdata-sections -Wl,--gc-sections -gdwarf-2 -include assert.h -o danny_i2 -DSTM32F030=1 \
           -I normal -DDOINLINE danny.c \
           normal/stm32f0xx_misc.c normal/system_stm32f0xx.c
/usr/local/armgcc/bin/arm-none-eabi-size danny_i2
   text    data     bss     dec     hex filename
   2900    1120      36    4056     fd8 danny_i2

dannyf · « **Reply #224 on:** October 01, 2014, 10:17:03 am »

Both inline and non-inline version of USART_DeInit() starts at ~1.4Kb with 1 invocation.

After that, the inlined version goes up 200 bytes per invocation; and the non-inlined version up minimally.

It is consistent with the conventional understanding on what inlining does.

I think the original 40% figure quoted for "inlining" might be for something else.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: STM32, ghetto style (Read 149464 times)

Share me