OUTDR writes to the Port directly, BSHR directly sets or resets a pin by writing a 1 into the correct place.
BSHR is better for single pin changes as you don't have to read the port value first, modify (mask) to preserve the rest and write back.
Just write a 1 somewhere, pin x of Port y will (re)set
If you want to load an entire byte like 0x35, overwriting the entire port, OUTDR is the way to go, this is what you would use with PortC, sending a whole byte to the bus.
It's going to hell due the printf calls you use, they're stalling your code flow.
Your code is misleading:
uint8_t D5_is_pressed = !GPIO_digitalRead(GPIO_port_D, 3); // but PD3 is A3 ?
uint8_t RW_low = !GPIO_digitalRead(GPIO_port_D, 4); // But PD4 is CS0 ?
Not sure, but it seems to me you must use better timing accuracy.
Instead delay_us(0.2) try inserting few NOPs, they waste one CPU clock so 20.8ns, so try:
// 10 NOPs, 208ns
asm("nop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop\nnop”);
20.8ns is the lowest delay you can get, or multiples of it.
Ensure to properly check the 6502 bus timings, 200ns seems too small for a bus running at 1.77MHz (282ns seems like it?) but I have no idea about it so it could be correct.