For anyone in a similar problem... I did end up making good progress on this an ultimately solved it with hardware.
Bascially, the RPi software implementation of transmit enable is almost useless. If you have the RTS/CTS pins enabled on your UART device, pyserial can open the port in an RS485 mode which uses the RTS pin as a transmit enable pin (configurable high or low on transmit).
Unfortunately it is very slow. I have to double check exact numbers, but while the transmit pin enables around 30us or something before the first start bit, the transmit enable pin stays high for around 30
milliseconds after the last stop bit. I'm using the MODBUS protocol on the bus and the server devices are responding in about 130us, so the RPi completely blocks any response message. I did read a blog post of a programmer who was trying to solve the same issue and reduced the turn off delay to around 2-3ms with his own software implementation, but that's still 20-30x too slow.
I ended up using a 555 in a monostable configuration, with a mosfet on the timing cap to reset the charging cycle everytime there is a falling edge on the TX line, giving a maximum turn off time of ~90us after the last rising edge on the TX line. This is designed around a baud rate of 115200 with 10 bits per frame. It ensures that a message can be arbitrarily long, and even if the message data bits are all 1, the transmit enable signal will stay high for the entirety of the message.
The last problem is the propagation delay between the TX line start bit going low, and the output of the 555 going high. The solution is a little crude, but I ended up using a string of LM393 comparators to introduce a delay of around 1us (4x ~240ns) to the TX signal being applied to the MAX485 transceiver. The comparators have a symmetrical delay on rising and falling edges so the signal keeps all the transition timings and just delays the signal. From the MAX485 datasheet, if I'm reading it correctly, it only needs around 70ns between DE/RE going high and data being applied to the DI pin.
This has all worked fine so far at 115200 baud on crappy unknown sourced 555 timers. The final design will incorporate an LMC555 and an optoisolated MAX485 transceiver, and the transmit pin should still be enabled at least 900ns before the serial message.
By the way, the internet is full of really dodgy designs for auto direction control. Most seem to only enable the RE/DE pin when the TX line is low, leaving the bus only passively pulled up on high states, plus there's a delay between the data being applied to the transceiver and the RE/DE pin going high, which then affects the timing of the signal on the receiving device