That is why i suggested those tests on the GPIO pin it's self, you need to confirm it's fast enough both without any load on it, and with the mosfet's gate connected to it, thus you need to see if the mosfet it's self is the problem, or the drive is. Single shot capture on the rising edge, look at the transition time between the ground level and Vcc, see if it is much smaller then needed or for some reason it isn't. If let say without anything connected to that pin, the rising time is fast enough ( should be so ) but with the gate connected the rising time becomes much larger, than you do need a driver ( you can test this with any mosfet you decide to use, just to be sure ).
There are SOT23 drivers that would do the job perfectly, very fast, logic level compatible inputs, and very low component count.
More so, you can power them from a higher voltage ( let say 5V ), and still have a 3,3V control but more drive voltage to the gate, and less losses ( conduction and switching ).
You can even improvise a 2 transistor totem poll but i would choose a sot23 driver anytime i would need one.