Second fault. Caught this one fairly quickly. A software bug! Shame on me.
The power bar publishes it's state on a message topic. However to get a snapshot of it's state you need to send it a message on another topic. I had added this of course. So when my controller was run it would send a status request for each of the sockets and that would bring it into consistent state with the device and I don't need to spam, "ON", "ON", "ON", "ON" everytime the temperature changes 0.01 degree. If I "know" it's already ON, just leave it.
The first issue is that I sent the status request before the client connected and it got dropped. So my view of the socket state showed them all "OFF" and so it "left the heater off" as it should be off.
The temperature sored to 30*C.
The second issue is MQTT. It's great, but, like most asynchronous message buses it has it's downsides. Message loss is an issue. Not "over the wire" as TCP should correct packet loss, but in many corner cases messages get dropped, over written in caches, dropped on disconnect etc.
I have been trying to avoid having to wrap my own layer of transactional context on top of MQTT, but this project makes me reconsider that.
I am thinking of using the Promise style pattern. Instead of chucking the command message over the fence with just a "publish( topic, message )" and hoping I get an appropriate action, I should send a request() and get a Promise in return. A Promise is a contract that the component you requested something from WILL give you a response. Not now, but later. It might be a success or an error, but you will get called back. (For old-school folks, it's a async-callback).
The code behind it would publish the command message, but wait for the status topic to reply back with the same state as requested for a short timeout, maybe retry a few times and finally return success or failure in the promise assigned call back.
The end-device does not implement request IDs, so no way to track request-response 1 for 1. Just need to wing it.
In the mean time, I added a timer to republish status requests every minute, just in case.