Which way is better just depends on layout and design considerations. If you have a lot of shift registers in series, then eventually you can exceed the fanout of the clock, chip select, and latch signals. You may want to read separate shift register strings in parallel for extra speed. Or maybe you want a separate set of shift registers for galvanically isolation, low noise, or time critical I/O.