On an FPGA the mux adds one LUT to the data path and adds delay. It's not the LUT that's slow, it's routing the signal from unrelated logic that is possibly far away, so you will want to register it first before doing anything.
Ideally all outgoing and incoming signals should be registered for best performance.