As already said, using multiple micros can add a lot more complexity than a superficial look would reveal; in particular you have to consider the multiplication in failure modes arising from the independant operation of the various subsystems and all the code required to handle them. Of course, in trivial applications it may not be an issue - if something goes wrong with your waste basket automatic lid opener you may well be amused by the random opening and closing of the lid. When you get bored you remove and replace the batteries to restore order.
With a home automation system, failures, both software and hardware may not be a direct safety concern - ie. a gas boiler won't operate unsafely even if the heating controls try to tell it to do so, but they could be economically damaging by operating the boiler at full output (controlled only by its internal overheat sensors) for extended periods. Worse, at the same time the air conditioning may be set to full. Or the burglar alarm may get silently disabled.
With distributed processing you have to deal with the fact that one or more subsystems might not respond at all to query or command messages, or perhaps worse, respond either earlier or later than expected. On a single processor, in general, if it is running then all functions will be running, and all at the same speed. If the processor fails to run for some reason then all functions stop.
In the distributed case, inter-device communication protocols have to allow for getting no response to a message meaning timeouts and retry mechanisms are required. But how long for the timeout? Typically you'll estimate the time required, allowing for a reasonable number of retries, and add a bit of margin. Mostly it's not critical and a large margin can be used, but not always without seriously compromising the performance or functionality - fast polling of remote devices may be required for a responsive UI but if a non critical remote sensor is unplugged a 10 second timeout could seriously slow down the system - unless a more complex scheme of parallel polling is implemented with the increased risk of software errors.
So you implement it and get it all running and because you are concientious you test that all your timeouts are appropriate by measuring the response time for every type of message that can be sent - all work that is generally unecessary in a single processor solution. It all works fine - until it doesn't because you hadn't measured the worst case response time. It may be that processor A request some data from processor B and usually it has the data and responds immediately; very occasionally it has to get the data from processor C because something changed - eg. a limit switch being activated indicating that a motion sensor had lost sync and needed recalibration. The response from B to A is now much longer than usual and the timeout may have expired.
Processor A decides that the message has failed for some reason so repeats its request. Shortly after, the delayed response to its first request arrives from B but unknown to A because the messaging protocol didn't include sequencing data, with the wrong data related to the first request and not the later and possibly slightly different 2nd request. To make matters worse the response to the second request then arrives from B - if you are lucky A rejects it because it wasn't expecting it but worse it might be sat in an input queue which only gets read when A expects the response to its next request. A hasn't implemented a 'too fast response' timeout error check so now gets permanently out of sync with B.
Then there is the issue of different startup times for each device; usually simple to deal with by waiting long enough before messaging other devices to be certain they are all running but it gets harder as the allowed time gets reduced to the minimum possible, required for a "responsive" system. Especially if the devices are programmed by different people or teams. It all worked brilliantly until one decides to add a comprehensive hardware selftest on startup which means it is almost always ready to receive the first message from a master device but very ocasionally isn't.
And so on and on. Many scenarios have to be considered which arise purely from the decision to distribute the functionallity. Maintenance is made more difficult because of the timing dependencies and lots more documentation may be required to be generated and maintained.