Forgot, the other reason for STM32 is the built-in support for peripherals - I need SPI/I2C/CAN/ETH across the various sensors within the sensor.
The ESP32 has all that (although ETH would require an external chip like Wiznet W5500 but you'll need an ethernet PHY anyway for an STM32) together with a software ecosystem which is easier to use compared to ST's HAL. The value for money an ESP32-S3 offers is mind boggling.
The first ESP32 series had an ethernet MAC built in. Phy's are much more generic&cheaper to get than those Wiznet chips, although those have their advantages too (like for example, having to manage a TCP/IP stack in your own code).
Yes the value for money of ESP32 silicon is great, but my mental health cannot cope with ESP-IDF.
The ESP32 itself has some power saving features, but they are nothing in comparison to any STM32 (or actually any other MCU, even NXP's industrial parts do much better). In particular series like SiLabs chips, or STM32U or STM32L parts can be interesting, as they feature low power regulators and more internal oscillator modes that allows a balance between 'race to sleep' or more of an "autonomous" operation (some MSP430s also feature this).
I think most MCU's will resume code execution after a sleep instruction, while maintaining all of RAM state with just a few uA and much much shorter wakeup latencies than ESP32. For some reason the ESP32s always perform a soft reset, run through the bootloader (which can be painfully slow) and only keeps RTC SRAM state.
If OP is choosing an energy guzzling wireless standard such as WiFi, then trying to optimize sleep currents is penny pinching in comparison. For example, you can store the WiFI AP network channel to speed up the connection process (having it near <=300ms is great). However, if the AP (like most modern consumer ones by default) will periodically hop channels depending on environment conditions, then you'll need to have the sensor node perform a SSID list scan which often takes several (2-5) seconds to complete.
BLE is probably a much better option though if more frequent wake ups are needed. But personally I haven't used it much yet. I was planning to with some EFR32MG22 chips (of which there is also an
Arduino variant with EFR32MG24 SoC), but that project is collecting dust for 2 years already.
Oh and regarding peripherals.. CAN and Ethernet are often not part of these ultra low power parts. The Phy's for those communication standards are energy guzzlers, so it doesn't make much sense to include them. Even for many sensor nodes, the radio is a major energy consumer which is highly complex subject as it can optimize various parameters for various scenarios (think latency vs throughput, but also more factors)