How is my TDO having 10 bit sent out instead of 9 bits as my IR total scan chain length is only 9 bits?
Not sure I understood the question, but you need to shift one more time, because you have 2 devices in the JTAG chain. When many JTAG devices are chained, you need to shift the data through all of them, until the data arrives to your JTDO.
Between each TDI-TDO, there is a flip-flop (FF). When many TDI-TDO are chained, the FFs will form a shift register.
You have 2 devices in your JTAG chain, so you have 2 FF, like this:
JTDI ->
TDI(boundary) -> FF(boundary) -> TDO(boundary) ->
TDI(M4) -> FF(M4) -> TDO(M4) ->
JTDO
You'll need one more clock edge in order to push out the last bit of the first IR (boundary TAP) through the FF(M4).
Thinking only about the boundary TAP, on the 5th clock that reads its 5 bits long IR, the 5th IR bit will be shifted out, and because the FF(M4) is in the way, the 5th IR bit of the boundary TAP is not yet seen at JTDO. One more clock is needed to push it through the FF(M4).
If it were to have yet another device inserted in the JTAG chain, say an external flash memory with JTAG (or something else with JTAG) additionally to the 2 devices you already have (boundary TAP and M4 TAP), like this:
JTDI ->
TDI(boundary) -> FF(boundary) -> TDO(boundary) ->
TDI(M4) -> FF(M4) -> TDO(M4) ->
TDI(flash) -> FF(flash) -> TDO(flash) ->
JTDO
then you would see yet another extra clock, needed to shift the data through the FF of the 3rd device, the FF(flash).