Work with your software serial implementation. At 4800 bps, 200 bytes, say 2000 bauds, takes nearly 500ms to transmit, so obviously you can't do it every 100ms. Even if you up the speed by 10x, you would be running the CPUs at full power at 50% duty cycle, terrible for power.
So do it as quickly as humanly possible, optimize the shit out of the code, so you can go back to sleep as early as possible. I don't see why you couldn't get somewhere close to 1Mbaud/s even with a bit-banged serial, but for a proper, less power hungry solution, pick another AVR MCU, there are those with two uarts, so you can leverage your existing knowledge and code. With a on-chip uart peripheral instead of software implementation, you can sleep the core and wake up on interrupt when a full byte has been received for processing.
More generally, your problem seems to be that you are trying out example code from others but don't have enough experience to write your own, test it, and solve problems that inevitably arise. The good news is, you gain experience by doing. Pick any proposed solutions (SPI slave despite your code currently being "unstable"; or the software bitbang serial despite your code currently being "slow"); then do some serious and concentrated work on it to make it work. Look at every issue: why is this happening? Add instrumentation: measure. Try a solution. Rinse & repeat.