when you want to connect a lot of chips in series, step one is the series connection between the chips.
The TPIC6C595 could have 5 signals in the series connection.
SER IN comes from the previous chip.
The remaining 4 are parallel connected. A parallel connection like this can have fan out problems. Will get back to this later.
In figure 14 the SRCLR pin is pulled high. Read 8.3.2, and note that as wired in figure 14, using the clear input would turn on the LEDs. With no clear input, you have random data on output at power up until the MCU has initialed proper levels by a serial shift of all bits followed by pulse on RCK. But note with control of G you can turn off outputs so random data in latch should not be a problem.
If this is ok then you have 4 signals between chips to daisy chain.
You add buffers to reduce fan-out load but also to clean up the signals.
Step two is the interface width mpu to drive these signals properly.
A quick look suggests some work when connected to SPI & some might not.
Serial in could connect to MOSI
Serial clock could then connect to SCI clock
RCK does not match up to a SCI signal.
As shown on page 9, "Voltage Waveforms" this pin wants a pulse after shifting all bits out.
G also does not match up.
So use two pins of MCU for RCK & G
If you have something else connected to SPI port then CS needs to prevent clocks to RCK when not selected.
This gives MPU to buffer to TPIC6C595 with the SPI CS controlling the OE of buffer to RCK
I'm thinking of designing these boards with 64 I/O on the one board - that way a single 74HC125 would buffer the 8 x IC's (either TPIC6C595's or 74HC165's).
This is putting a lot of connections to one board and in process adding a lot of wire to get from board to leds.
If you are going to have PC boards made, more smaller boards could be cheaper.
Timing
"INPUT SETUP AND HOLD WAVEFORMS" page 9 SRCK & SER IN
If you read up on SOI, tsu = th with both = 25% of clock
tsu + th = 50% of clock
Now if SRCK has a period of 110ns
The signals are delayed 11ns per buffer. After 10 buffers the signals align again with one bit delay in time.
Now think of what would happen for above if one signal went through 5 more buffers the the other. You would have data changing at same time as clock,.
If SRCK is slower it would take more buffers to get one bit time delay.
For Input chain
If you think it through, you could get a huge fan-out with two levels of buffers.
The problem with this is long signal runs with boards around the layout.
With buffers in series, you can use a slower clock to collect input data. With slower clock the 11ns per buffer becomes a smaller % of clock time.
With MPU driving the clock input and MPU reading the data output of buffer chain.
In the above drawing Tsu is changing from 25% with each added buffer.
So to put it all together
Using the MOSI & SPI clock lets you output data faster. You still need two more pins for RCK to latch data & for G
For input nothing stops you from using signal that goes to RCK to latch the input data many times and not read the inputs.
Then after X outputs of RCK you could use and additional pin to clock the 74HC165 shift registers at slower rate and one more pin to read the data using bitbang mode.
For input 10 to 20 reads of inputs is probably fine. If you are using switches to supply the inputs to the 74HC165 you could have some contact bounce.
This totals 6 pins on MPU.
The higher output rate would let you dim the leds if you wanted to.