Author Topic: STM32H743ZI -> GPIO drive performance issue, with MCUFriend_kbv  (Read 1433 times)

0 Members and 1 Guest are viewing this topic.

Offline MasterTTopic starter

  • Frequent Contributor
  • **
  • Posts: 785
  • Country: ca
Summary:

Hardware: There are 2  nucleo-144 boards, stm32<F>767zi & stm32<H>743zi (both cortex M-7, pin compatible).
                  TFT LCD shield, ILI9486 320x480.

Software: Arduino IDE 1.8.9, + MCUFRIEND_kbv library, sketch : graphictest_kbv.
https://github.com/prenticedavid/MCUFRIEND_kbv

 Since library doesn't have support for H743 processor at this time, I've done some simple "copy-paste" upgrade using F767 as a reference. Here is essential part, for briefing:
Code: [Select]
#elif IS_NUCLEO144 // Uno Shield on NUCLEO-144
#warning Uno Shield on NUCLEO-144 >STM32H743xx<
#define RD_PORT GPIOA    //PA3
#define RD_PIN  3
#define WR_PORT GPIOC    //PC0
#define WR_PIN  0
#define CD_PORT GPIOC    //PC3
#define CD_PIN  3
#define CS_PORT GPIOF    //PF3
#define CS_PIN  3
#define RESET_PORT GPIOF //PF5
#define RESET_PIN  5

// configure macros for the data pins
#define DMASK ((1<<15))                         //#1
#define EMASK ((1<<13)|(1<<11)|(1<<9))          //#3, #5, #6
#define FMASK ((1<<12)|(1<<15)|(1<<14)|(1<<13)) //#0, #2, #4, #7

#define write_8(d) { \
        GPIOD->REGS(BSRR) = DMASK << 16; \
        GPIOE->REGS(BSRR) = EMASK << 16; \
        GPIOF->REGS(BSRR) = FMASK << 16; \
        GPIOD->REGS(BSRR) = (  ((d) & (1<<1)) << 14); \
        GPIOE->REGS(BSRR) = (  ((d) & (1<<3)) << 10) \
                            | (((d) & (1<<5)) << 6) \
                            | (((d) & (1<<6)) << 3); \
        GPIOF->REGS(BSRR) = (  ((d) & (1<<0)) << 12) \
                            | (((d) & (1<<2)) << 13) \
                            | (((d) & (1<<4)) << 10) \
                            | (((d) & (1<<7)) << 6); \
    }

#define read_8() (       (  (  (GPIOF->REGS(IDR) & (1<<12)) >> 12) \
                            | ((GPIOD->REGS(IDR) & (1<<15)) >> 14) \
                            | ((GPIOF->REGS(IDR) & (1<<15)) >> 13) \
                            | ((GPIOE->REGS(IDR) & (1<<13)) >> 10) \
                            | ((GPIOF->REGS(IDR) & (1<<14)) >> 10) \
                            | ((GPIOE->REGS(IDR) & (1<<11)) >> 6) \
                            | ((GPIOE->REGS(IDR) & (1<<9))  >> 3) \
                            | ((GPIOF->REGS(IDR) & (1<<13)) >> 6)))


//                                             PD15                PE13,PE11,PE9          PF15,PF14,PF13,PF12
#define setWriteDir() { setReadDir(); \
                        GPIOD->MODER |=  0x40000000; GPIOE->MODER |=  0x04440000; GPIOF->MODER |=  0x55000000; }
#define setReadDir()  { GPIOD->MODER &= ~0xC0000000; GPIOE->MODER &= ~0x0CCC0000; GPIOF->MODER &= ~0xFF000000; }


Because both boards  pin-compatible, GPIO- basic bit banging is the same. 
  I see a warning message "#warning Uno Shield on NUCLEO-144 >STM32H743xx<" confirming compilation this part of the code.

Results:

F-767
Serial took 0ms to start
ID = 0x9486
MCUFRIEND 2.99 UNO
FillScreen               156615
Text                     8420
Lines                    156954
Horiz/Vert Lines         13015
Rectangles (outline)     7559
Rectangles (filled)      382163
Circles (filled)         69338
Circles (outline)        67380
Triangles (outline)      43409
Triangles (filled)       132859
Rounded rects (outline)  23750
Rounded rects (filled)   420259
Total:1.48sec
ID: 0x9486
F_CPU:216.00MHz


H743
Serial took 0ms to start
ID = 0x9486
MCUFRIEND 2.99 UNO
FillScreen               875563
Text                     25855
Lines                    515536
Horiz/Vert Lines         71397
Rectangles (outline)     40012
Rectangles (filled)      2134143
Circles (filled)         276067
Circles (outline)        221123
Triangles (outline)      143537
Triangles (filled)       690744
Rounded rects (outline)  89524
Rounded rects (filled)   2318388
Total:7.40sec
ID: 0x9486
F_CPU:400.00MHz

  It's about 5 times (!!!) slower, and I'm running out of options what may be the cause.  Hardware configured correctly,
           F767  H743     
clock  216    400
AHB1 216 
AHB4           200

What confusing, is moving GPIO bus to AHB4 from "normally" used AHB1 in the H743 series, that may explain enormously huge latency.

« Last Edit: May 10, 2019, 06:43:56 pm by MasterT »
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2599
  • Country: us
Re: STM32H743ZI -> GPIO drive performance issue, with MCUFriend_kbv
« Reply #1 on: May 10, 2019, 06:16:30 pm »
Are you SURE that the clocks are configured correctly in the two parts?  The F7/H7 have fairly complex clock trees, and they likely aren't the same between the two parts.  Use one the MCO pins and measure the output frequency to be sure.

Are those bitbanging functions how you're talking to the LCD?  Not that it's your first concern, but putting the LCD on the FMC interface will be worlds more efficient--writing to the LCD can then be done via DMA.  If you're using SPI then you need to make sure the SPI interfaces are running at the correct speed (see about the clock tree above.  Also, some SPI peripherals may be on different clock domains, so different SPIs on the same part might give different results*.)

* I don't think ST actually does this, but it's always important to be cognizant of where each peripheral is getting its clock from.
 

Offline MasterTTopic starter

  • Frequent Contributor
  • **
  • Posts: 785
  • Country: ca
Re: STM32H743ZI -> GPIO drive performance issue, with MCUFriend_kbv
« Reply #2 on: May 10, 2019, 06:48:21 pm »
Yes, have MCO2 configured with /10 and scope verified 40MHz sharp. HSE is primary, stlink derived, jitter free.
SPI & FMC are not related to topic, question is not how to ride lcd fast, but why H7 5 times slower than F7 driving gpio.
I haven't seen other issues with performance so far, FFT running quite fast, will check numbers against F7.

Updates:

 The issue verified, compiling bare metal GPIO driving macros().
Arduino sketch:
Code: [Select]
#define REGS(x) x

#if defined(STM32F767xx)
#define GPIO_INIT()   { RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_GPIOCEN | RCC_AHB1ENR_GPIODEN | RCC_AHB1ENR_GPIOEEN | RCC_AHB1ENR_GPIOFEN; }
#elif defined(STM32H743xx)
#define GPIO_INIT()   { RCC->AHB4ENR |= RCC_AHB4ENR_GPIOAEN | RCC_AHB4ENR_GPIOCEN | RCC_AHB4ENR_GPIODEN | RCC_AHB4ENR_GPIOEEN | RCC_AHB4ENR_GPIOFEN; }
#endif


// configure macros for the data pins
#define DMASK ((1<<15))                         //#1
#define EMASK ((1<<13)|(1<<11)|(1<<9))          //#3, #5, #6
#define FMASK ((1<<12)|(1<<15)|(1<<14)|(1<<13)) //#0, #2, #4, #7

#define write_8(d) { \
        GPIOD->REGS(BSRR) = DMASK << 16; \
        GPIOE->REGS(BSRR) = EMASK << 16; \
        GPIOF->REGS(BSRR) = FMASK << 16; \
        GPIOD->REGS(BSRR) = (  ((d) & (1<<1)) << 14); \
        GPIOE->REGS(BSRR) = (  ((d) & (1<<3)) << 10) \
                            | (((d) & (1<<5)) << 6) \
                            | (((d) & (1<<6)) << 3); \
        GPIOF->REGS(BSRR) = (  ((d) & (1<<0)) << 12) \
                            | (((d) & (1<<2)) << 13) \
                            | (((d) & (1<<4)) << 10) \
                            | (((d) & (1<<7)) << 6); \
    }


//                                             PD15                PE13,PE11,PE9          PF15,PF14,PF13,PF12
#define setWriteDir() { setReadDir(); \
                        GPIOD->MODER |=  0x40000000; GPIOE->MODER |=  0x04440000; GPIOF->MODER |=  0x55000000; }
#define setReadDir()  { GPIOD->MODER &= ~0xC0000000; GPIOE->MODER &= ~0x0CCC0000; GPIOF->MODER &= ~0xFF000000; }



  volatile  uint32_t temp1, temp2;


void setup()
{

  GPIO_INIT();
  setWriteDir();
/*
  __HAL_RCC_CSI_ENABLE() ;
  __HAL_RCC_SYSCFG_CLK_ENABLE() ;
    HAL_EnableCompensationCell(); 
*/
}

void loop()
{
  // H743 = 2.777 MHz
  // F767 = 4.908 MHz
 
    temp1 = 0xAAAAAAAA;
    temp2 = 0x55555555;
   
    for( uint32_t i = 0; i < 0x1000000UL; i++) { 
      write_8(temp1);
      write_8(temp2);
      }
}

 I 'm sure it's hardware fault, the uCPU that supposed to be almost twice faster in reality 5 times slower. Scope shows 2.7 MHz vs 4.9 Mhz but there is only write_8() subfunction left for simplicity. Complete benchmark mcufriend_kbv shows more than 5x times difference.
 Transferring GPIO on AHB4 seems dumb idea for me.
« Last Edit: May 10, 2019, 11:50:57 pm by MasterT »
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf