Author Topic: STM32 TFT LCD library  (Read 12012 times)

0 Members and 1 Guest are viewing this topic.

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
STM32 TFT LCD library
« on: August 13, 2020, 08:16:34 pm »
STM32F103 cubeide. Is there better lib or how to create one?
this lib is rather slow ~1.2s screen redraw rate for 2.8" HX8347D   compared to MCUfriend lib on Arduino uno. stm32 data port show slower data transfer rate compared to arduino uno  for some reason
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1842
  • Country: se
Re: STM32 TFT LCD library
« Reply #1 on: August 13, 2020, 11:17:37 pm »
What takes 1.2 s? Seems a lot of time.

Part of it is "justified" by the use of random pins instead of a full port, part by the fact that this is looks like a port from an Arduino or AVR library (with support for a lot of different parts).
Once in a while the HAL should not be the bad guy: his writes are direct to registers.

To give a comparison point with some code I've written:
  • LCD 480×320, ili9841
  • 16 bit parallel interface (so wider than that display)
  • STM32F072 running at 48 MHz (so slower than the 72 MHz STM32F103)
  • HAL used for initializations, but direct register writes - same as in the video
Time needed for a single colour full screen fill: 45 ms.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #2 on: August 14, 2020, 05:21:02 pm »
before optimization O0 GPIO port set time 240nS, Display data bus 5uS, Fill screen 625mS
with O3 GPIO port set time 125nS, Display data bus 1uS,  Fill screen 175mS
why 1uS not 125nS to set display data ?


#define write_8(d) { \
          GPIOA->BRR = 0x700 ; \
          GPIOB->BRR = 0x438 ; \
          GPIOC->BRR = 0x80 ; \
          GPIOA->BSRR = (((d) & (0x01)) << 9) \
                       | (((d) & (0x04)) << 8) \
                       | (((d) & (0x80)) << 1); \
          GPIOB->BSRR = (((d) & (0x08)) << 0) \
                       | (((d) & (0x10)) << 1) \
                       | (((d) & (0x20)) >> 1) \
                       | (((d) & (0x40)) << 4); \
          GPIOC->BSRR = (((d) & (0x02)) << 6);\
    }
 

Offline dave j

  • Regular Contributor
  • *
  • Posts: 145
  • Country: gb
Re: STM32 TFT LCD library
« Reply #3 on: August 14, 2020, 06:18:56 pm »
before optimization O0 GPIO port set time 240nS, Display data bus 5uS, Fill screen 625mS
with O3 GPIO port set time 125nS, Display data bus 1uS,  Fill screen 175mS
why 1uS not 125nS to set display data ?


#define write_8(d) { \
          GPIOA->BRR = 0x700 ; \
          GPIOB->BRR = 0x438 ; \
          GPIOC->BRR = 0x80 ; \
          GPIOA->BSRR = (((d) & (0x01)) << 9) \
                       | (((d) & (0x04)) << 8) \
                       | (((d) & (0x80)) << 1); \
          GPIOB->BSRR = (((d) & (0x08)) << 0) \
                       | (((d) & (0x10)) << 1) \
                       | (((d) & (0x20)) >> 1) \
                       | (((d) & (0x40)) << 4); \
          GPIOC->BSRR = (((d) & (0x02)) << 6);\
    }

When I had a similar problem, I defined 3 lookup tables, one for each port, and initialised them at start using:
Code: [Select]
static uint32_t BitMapA[256];
static uint32_t BitMapB[256];
static uint32_t BitMapC[256];


for (int i=0; i<256; i++)
{
    uint32_t valueA = 0;
    uint32_t valueB = 0;
    uint32_t valueC = 0;

    if (i & 0x01)
        valueA |= 1<<9;
    else
        valueA |= 1<<(9+16);
    if (i & 0x02)
        valueC |= 1<<7;
    else
        valueC |= 1<<(7+16);
    if (i & 0x04)
        valueA |= 1<<10;
    else
        valueA |= 1<<(10+16);
    if (i & 0x08)
        valueB |= 1<<3;
    else
        valueB |= 1<<(3+16);
    if (i & 0x10)
        valueB |= 1<<5;
    else
        valueB |= 1<<(5+16);
    if (i & 0x20)
        valueB |= 1<<4;
    else
        valueB |= 1<<(4+16);
    if (i & 0x40)
        valueB |= 1<<10;
    else
        valueB |= 1<<(10+16);
    if (i & 0x80)
        valueA |= 1<<8;
    else
        valueA |= 1<<(8+16);
       
    BitMapA[i] = valueA;
    BitMapB[i] = valueB;
    BitMapC[i] = valueC;
}

Writing the value to the LCD then became simply:
Code: [Select]
GPIOA->BSRR = BitMapA[cmd];
GPIOB->BSRR = BitMapB[cmd];
GPIOC->BSRR = BitMapC[cmd];

If you can spare the RAM you could give that a try.
I'm not David L Jones. Apparently I actually do have to point this out.
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1842
  • Country: se
Re: STM32 TFT LCD library
« Reply #4 on: August 14, 2020, 08:57:17 pm »
before optimization O0 GPIO port set time 240nS, Display data bus 5uS, Fill screen 625mS
with O3 GPIO port set time 125nS, Display data bus 1uS,  Fill screen 175mS
why 1uS not 125nS to set display data ?


#define write_8(d) { \
          GPIOA->BRR = 0x700 ; \
          GPIOB->BRR = 0x438 ; \
          GPIOC->BRR = 0x80 ; \
          GPIOA->BSRR = (((d) & (0x01)) << 9) \
                       | (((d) & (0x04)) << 8) \
                       | (((d) & (0x80)) << 1); \
          GPIOB->BSRR = (((d) & (0x08)) << 0) \
                       | (((d) & (0x10)) << 1) \
                       | (((d) & (0x20)) >> 1) \
                       | (((d) & (0x40)) << 4); \
          GPIOC->BSRR = (((d) & (0x02)) << 6);\
    }
But, really, do you need this awful, unreadable, messy and sparse random bit access?

If you have 8 contiguous bits available on any port,  just use them!
It will save a lot of shifts and ors (as clever as the compiler can be, it can't do miracles).
It will also make the code clearer.

Moreover, the macro as defined is not following the usual do {...} while(0) convention for multi-statement macros, my preference goes in any case to static inline (I know it's a hint, but gcc heeds it almost always - with -Og or higher - not the point of discussion here).

For proof, go to https://godbolt.org/z/jjWzE5 and see what happens changing write_8() to write_8cont()!
Optimization are at -Og, really little changes with -O3. Have fun trying different stuff, it's usually instructive.
Nandemo wa shiranai wa yo, shitteru koto dake.
 

Offline Mecanix

  • Frequent Contributor
  • **
  • Posts: 269
  • Country: cc
Re: STM32 TFT LCD library
« Reply #5 on: August 15, 2020, 08:06:27 am »
That bit shifting/random gpio and other will greatly kill performance. Use masks if you are limited and forced to use two ports:

Code: [Select]
#define DATA(x) { \
GPIOA->BSRR = (maskPa << 16) | (maskPa & x); \
GPIOB->BSRR = (maskPb << 16) | (maskPb & x); \
}
 
The following users thanked this post: thm_w, newbrain

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #6 on: August 15, 2020, 08:37:19 am »
GPIO port set time 125nS, Display data bus 450nS,  Fill screen 90mS

 replaced PIN_LOW /PIN_HIGH  to   GPIOA ->BRR = 0x02/ GPIOA ->BSRR |= GPIO_BSRR_BS

 

Offline Mecanix

  • Frequent Contributor
  • **
  • Posts: 269
  • Country: cc
Re: STM32 TFT LCD library
« Reply #7 on: August 15, 2020, 09:06:25 am »
Yup. Much faster! Faster than the old GPIOB->ODR too.
That's what I use to set pins also. e.g.:

Code: [Select]
//Set
#define LCD_WR_SET GPIOB->BSRR = 1<<4; //PB4
//Reset
#define LCD_WR_CLR GPIOB->BRR = 1<<4; //PB4
 

Offline Mecanix

  • Frequent Contributor
  • **
  • Posts: 269
  • Country: cc
Re: STM32 TFT LCD library
« Reply #8 on: August 15, 2020, 09:14:56 am »
With proper optimization (-O3) I can watch vids with a 100Mhz chip (STM32F4 series). Literally fluid and can do anything on a 168Mhz (e.g. STM32F407). (I'm on 16bits though, recommend).
They're good game those ST mcus... have fun!
 

Offline newbrain

  • Super Contributor
  • ***
  • Posts: 1842
  • Country: se
Re: STM32 TFT LCD library
« Reply #9 on: August 15, 2020, 02:18:57 pm »
Code: [Select]
GPIOA->BSRR = (maskPa << 16) | (maskPa & x);
:-+
Nice trick for an atomic port update, even when writing full uint8_t or uint16_t:
  • Probably faster than the obvious GPIOx->ODR |= value (one less read)
  • Interrupt safe: if other part of the port are modified in an ISR, it will still work
The behaviour of "setting bits wins over resetting them" is AFAICS specified for all STM32xxx.
Nandemo wa shiranai wa yo, shitteru koto dake.
 
The following users thanked this post: Mecanix

Offline dave j

  • Regular Contributor
  • *
  • Posts: 145
  • Country: gb
Re: STM32 TFT LCD library
« Reply #10 on: August 15, 2020, 03:15:26 pm »
before optimization O0 GPIO port set time 240nS, Display data bus 5uS, Fill screen 625mS
with O3 GPIO port set time 125nS, Display data bus 1uS,  Fill screen 175mS
why 1uS not 125nS to set display data ?


#define write_8(d) { \
          GPIOA->BRR = 0x700 ; \
          GPIOB->BRR = 0x438 ; \
          GPIOC->BRR = 0x80 ; \
          GPIOA->BSRR = (((d) & (0x01)) << 9) \
                       | (((d) & (0x04)) << 8) \
                       | (((d) & (0x80)) << 1); \
          GPIOB->BSRR = (((d) & (0x08)) << 0) \
                       | (((d) & (0x10)) << 1) \
                       | (((d) & (0x20)) >> 1) \
                       | (((d) & (0x40)) << 4); \
          GPIOC->BSRR = (((d) & (0x02)) << 6);\
    }
But, really, do you need this awful, unreadable, messy and sparse random bit access?

If you have 8 contiguous bits available on any port,  just use them!
Definitely this. Before watching the video I thought they might be using a Nucleo board, which have Arduino compatible connectors so you can plug an Arduino pinout display directly into them. That's why I have the code I posted earlier. As they seem to be using a blue pill board that doesn't apply and if at all possible they should use any 8 contiguous bits.
I'm not David L Jones. Apparently I actually do have to point this out.
 

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #11 on: August 15, 2020, 06:41:16 pm »
How is text font and/or gif drawn on tft LCD?
STM32F407 have 8080/6800 controller
« Last Edit: August 15, 2020, 07:38:45 pm by strawberry »
 

Offline Mecanix

  • Frequent Contributor
  • **
  • Posts: 269
  • Country: cc
Re: STM32 TFT LCD library
« Reply #12 on: August 15, 2020, 07:35:25 pm »
I'm using lcd-image-converter (https://lcd-image-converter.riuson.com/en/about/). Support both Fonts & Bitmap.
Examples as-is are performing incredibly well and are straightforward implementations: https://lcd-image-converter.riuson.com/en/docs/examples/sources/
Kudos to the developer
 

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #13 on: August 15, 2020, 08:05:25 pm »
font.c what is in HEX tables and how to draw custom bitmap on LCD?
 

Offline Mecanix

  • Frequent Contributor
  • **
  • Posts: 269
  • Country: cc
Re: STM32 TFT LCD library
« Reply #14 on: August 15, 2020, 11:00:24 pm »
Draw bitmap pixel by pixel, left to right/top to bottom. Same for fonts.
Dunno what font.c is, sorry. gl
 

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #15 on: September 05, 2020, 08:46:10 am »
GPIO port set time 25nS, Display data bus 120nS,  Fill screen 18mS

 

Offline julian1

  • Frequent Contributor
  • **
  • Posts: 781
  • Country: au
Re: STM32 TFT LCD library
« Reply #16 on: September 05, 2020, 08:46:52 pm »
While bit-bashing the parallel port, how are you interleaving the clock transitions. By using nop loops for timing? Or do you just assert the register and toggle the clock?
 

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #17 on: September 07, 2020, 07:16:08 pm »
no nop ,set data and toggle clock
Speed is near fastest controller could accept but there is still some wiggling when updating number on display.
 

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #18 on: March 14, 2021, 10:03:14 pm »
320x480 ILI9488 16bit parallel
« Last Edit: March 18, 2021, 06:12:14 am by strawberry »
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6542
  • Country: es
Re: STM32 TFT LCD library
« Reply #19 on: March 14, 2021, 10:48:28 pm »
That is incredibly slow! :-DD.
Doesn't that stm32 have parallel port? Use dma to write to it.
Anyways, something is badly optimized there. It's too slow!
Putting the I/O LCD routines into RAM can help:
Code: [Select]
//Ex. :
__attribute__((section(".ramfunc"))) void LCD_send(uint16_t data, bool isCMD){
...
For filling purposes you can setup DMA pointing to a single variable (not increasing pointer), so it does  color->DMA->LCD, you tell the DMA to send that value 320x240 times and forget about it.

I tested recently with a 135x240 ST7789V SPI display, 16-bit color mode. Overclocked the SPI to 32MHz and got 62FPS in filling.
The original library didn't use SPI DMA and the IO handling was terrible, and it was faster than the video!

So 16-bit parallel port should be a LOT faster!!


« Last Edit: March 14, 2021, 10:58:20 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #20 on: March 15, 2021, 04:22:13 am »
Putting the I/O LCD routines into RAM can help:
Code: [Select]
//Ex. :
__attribute__((section(".ramfunc"))) void LCD_send(uint16_t data, bool isCMD){
...
For filling purposes you can setup DMA pointing to a single variable (not increasing pointer), so it does  color->DMA->LCD, you tell the DMA to send that value 320x240 times and forget about it.
How to put in RAM?
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6542
  • Country: es
Re: STM32 TFT LCD library
« Reply #21 on: March 15, 2021, 07:46:29 am »
By adding that attribute line before the function
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #22 on: March 15, 2021, 10:33:58 am »
Code: [Select]
__attribute__((section(".ramfunc")))void pushBlock(uint16_t color, uint32_t len){cant see difference in RAM usage nor speed
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6542
  • Country: es
Re: STM32 TFT LCD library
« Reply #23 on: March 15, 2021, 11:03:05 am »
Code: [Select]
__attribute__((section(".ramfunc")))void pushBlock(uint16_t color, uint32_t len){cant see difference in RAM usage nor speed

Are you using O2 optimization? The speed difference is dramatic over no optimization at all.
You're right. I used this method in the past in a STM32F103, where it had an interrupt for a fast signal.
Without the ramfunc it would miss some events. But with it, it worked perfect.

I tested now in a F411 black pill board and IDE 1.6.0.
RAM address starts at 0x2000000, flash at 0x8000000. Clearly,  it's not being executed in ram. I don't know why, requires further research.



Anyways the code can be optimized a LOT. It uses a ton of calls.
Calls are not good for fast code, as it will likely cause cache miss, having to fetch new data from flash (slow).

First, the parallel port is not aligned:
Code: [Select]
#define RD_PORT GPIOA
#define RD_PIN  GPIO_PIN_4
#define WR_PORT GPIOA
#define WR_PIN  GPIO_PIN_3
#define CD_PORT GPIOA          // RS PORT
#define CD_PIN  GPIO_PIN_2     // RS PIN
#define CS_PORT GPIOA
#define CS_PIN  GPIO_PIN_1
#define RESET_PORT GPIOA
#define RESET_PIN  GPIO_PIN_0

#define D0_PORT GPIOB
#define D0_PIN GPIO_PIN_0
#define D1_PORT GPIOB
#define D1_PIN GPIO_PIN_1
#define D2_PORT GPIOA
#define D2_PIN GPIO_PIN_15
#define D3_PORT GPIOB
#define D3_PIN GPIO_PIN_3
#define D4_PORT GPIOB
#define D4_PIN GPIO_PIN_4
#define D5_PORT GPIOB
#define D5_PIN GPIO_PIN_5
#define D6_PORT GPIOB
#define D6_PIN GPIO_PIN_6
#define D7_PORT GPIOA
#define D7_PIN GPIO_PIN_5

Thus, the data has to be rearranged every time, doing atomic operations:
Code: [Select]
// Doing this deserves going to coder's hell.

  #define write_8(d) {
   GPIOA->BSRR = 0b1000000000100000 << 16; \        // Reset GPIOA bits 5(D7) and 15(D2)
//GPIOA->BSRR = (D7_PIN | D2_PIN) << 16;               // This is the same, but at least you can easily see wtf it's doing

   GPIOB->BSRR = 0b0000000001111011 << 16; \        // Reset GBIOB bits 0(D0),1(D1),3(D3),4(D4),5(D5),6(D6)

   GPIOA->BSRR = (((d) & (1<<2)) << 13) \                 // Load D7 and D2 bits
              | (((d) & (1<<7)) >> 2); \       

                     //
   GPIOB->BSRR = (((d) & (1<<0)) << 0) \                  // Load D0, D1, D3, D4, D5, D6
              | (((d) & (1<<1)) << 0) \                             
      | (((d) & (1<<3)) << 0) \
              | (((d) & (1<<4)) << 0) \
              | (((d) & (1<<5)) << 0) \
              | (((d) & (1<<6)) << 0); \


This the assembly result with O2 optimization: 17 instructions for only loading the 8-bit value into the port!
Code: [Select]
push    {r4, r5}
lsls    r3, r0, #13
asrs    r2, r0, #2
ldr     r4, [pc, #36]   ; (0x8000710 <write+48>)
ldr     r1, [pc, #40]   ; (0x8000714 <write+52>)
ldr     r5, [pc, #40]   ; (0x8000718 <write+56>)
str     r5, [r4, #24]
and.w   r2, r2, #32
and.w   r3, r3, #32768  ; 0x8000
orrs    r3, r2
and.w   r0, r0, #123    ; 0x7b
mov.w   r2, #8060928    ; 0x7b0000
str     r2, [r1, #24]
str     r3, [r4, #24]
str     r0, [r1, #24]
pop     {r4, r5}
bx      lr

Doing the same, but setting & resetting in the same instruction, 14 instructions:
Code: [Select]
   GPIOA->BSRR = (0b1000000000100000 << 16) |
           (((d) & (1<<2)) << 13) | (((d) & (1<<7)) >> 2);

   GPIOB->BSRR = (0b0000000001111011 << 16) |
           (((d) & (1<<0)) << 0) \
   | (((d) & (1<<1)) << 0) \
   | (((d) & (1<<3)) << 0) \
   | (((d) & (1<<4)) << 0) \
   | (((d) & (1<<5)) << 0) \
   | (((d) & (1<<6)) << 0);
Code: [Select]
lsls    r3, r0, #13
asrs    r2, r0, #2
and.w   r2, r2, #32
and.w   r3, r3, #32768  ; 0x8000
orrs    r3, r2
ldr     r1, [pc, #24]   ; (0x800070c <write+44>)
ldr     r2, [pc, #28]   ; (0x8000710 <write+48>)
orr.w   r3, r3, #2147483648     ; 0x80000000
and.w   r0, r0, #123    ; 0x7b
orr.w   r3, r3, #2097152        ; 0x200000
orr.w   r0, r0, #8060928        ; 0x7b0000
str     r3, [r1, #24]
str     r0, [r2, #24]
bx      lr


Aligning the port it would be much faster.
For example, using PA0-PA7 for data, only 5 instructions:
Code: [Select]
GPIOA->BSRR = 0xFF << 16; // Reset Data pins
GPIOA->BSRR = d; // Load data
Code: [Select]
ldr     r3, [pc, #12]   ; (0x80006f0 <write+16>)
mov.w   r2, #16711680   ; 0xff0000
str     r2, [r3, #24]
str     r0, [r3, #24]
bx      lr


Going further, resetting and setting the pins in the same instruction, only 4 instructions:
Code: [Select]
GPIOA->BSRR = (0xFF << 16) | d ; // Reset Data pins
Code: [Select]
ldr     r3, [pc, #12]   ; (0x80006f0 <write+16>)
orr.w   r0, r0, #16711680       ; 0xff0000
str     r0, [r3, #24]
bx      lr


Avoiding the call, it would be even faster, removing the call instruction from the origin, and the return instruction(bx) from the write routine.

According to the datasheet, setting and resetting in the same instruction is OK, as the SET bits have priority over the RESET bits:


And that's only a small analysis...  :-DD
« Last Edit: March 15, 2021, 01:00:12 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline strawberryTopic starter

  • Super Contributor
  • ***
  • !
  • Posts: 1199
  • Country: lv
Re: STM32 TFT LCD library
« Reply #24 on: March 15, 2021, 03:50:24 pm »
how to see disassembly in CubeIDE?
Code: [Select]
GPIOB->ODR = ( d ) ; GPIOA->BSRR = (0x00000002)  ;GPIOA->BSRR = (0x00020000 ) ;little faster
« Last Edit: March 15, 2021, 03:55:10 pm by strawberry »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf