You don't need any delay. Even at 2V supply voltage for the HC165 the worst case setup time (data must be stable before shift clock edge) is only 60ns. The fastest an AVR can do is 1 clock cycle delay, that is 62.5ns delay for 16MHz.
But, if you have long wires between AVR and shift register without any proper termination, bad grounding, etc., increasing the setup and hold time and reducing the clock speed by adding some delays, can help to improve the reliability of the circuit, because it allows for the ringing/overshoots caused by a transition on one wire to stop before sending the next transition.