Optimisation is possible because the zeroes are constant values. Feeding in zeroes will not affect the XOR operations (result remains unchanged if you xor with 0) but it does affect the INPUT of the XORs.
If you feed in zeroes up to CRC width, the first box will wrap around to the first, hence why the middle XOR's input is changed to the output of the first XOR logic gate.
What do you exactly mean by the first box will wrap around to the first ?
If you feed in zeroes of CRC width the value in the first box will xor middle gates and wrap around.
Let's look at the long division LFSR stages with a very simple example.
We are using USB CRC5 with a data bit (message bit) width of 1.
Suppose our message is 1:
data = 1
Note the results of each stage is calculated and shown in the next stage so I can show previous and current.
About the stages:
First we feed in message bit 1 at stage 0.
Next we append zeroes, so feed in a zero for the following five stages (width of CRC).
Notice the 1 coloured in red, it shows how this 1 bit moves inside the LFSR, this 1 is the XOR0 gate result from the start.
Look at the last stage 5, the red coloured 1 has reached into the last box, and now begins to wrap around and will be the input to both of the XOR gates.
You can see that the input to those XOR gates are:
XOR0 (first gate) = 1 ^ 0
XOR1 (middle gate) = 1 ^ 0
For the optimised version we only care for the last stage and all intermediate stage values can be thrown away or ignored. The last stage XOR1 is equivalent to inputting the XOR0 gate result and the previous box, which we could have done from the start without going through all those stages.