Indeed, the "worst case" really depends and it's hard to predict exactly. You can end up doing an optimistic assumption while thinking you are testing the worst case! While you can, with enough experience and knowledge, actually analyze the worst case conditions and simulate only them, it defeats the main purpose of testing: to verify that the knowledge and analysis was indeed correct!
The "state" of the system affects the worst case (for example - at what phase was the power removed, affecting residual magnetism), as well as the delay between the tests.
Instead of trying to find the single "worst" condition and sync to it exactly, I'd come up with automated way to generate connections/disconnections and log the results. Then, you can throw it with a truely random test sequency which will find you the actual worst case operating condition in around 100 cycles.
The test would be out of sync from the phase, and include long delays as well as very short delays.
In some designs, such as those using NTCs to provide inrush limiting, short-delay toggling or input power bouncing requires very careful analysis (or, the designers just ignore the issue).