OK then, that's unfortunate. What's the chip you're using?
You can try optimizing the logic path to reduce the propagation delay, although I suppose you have already done what you could for this part. If unsure, you can check the number of logic levels and try shortening the path.
If that's not possible, then yeah, changing the data on the preceding rising edge should work, but in this case, it will explicitely rely on the longish delay - the same code on a faster FPGA could fail.
So, if you do this, it would be a good idea to comment it in your code.