But I wonder how the clock recovery actually works, Do you have any Idea? I know it's internal to the core and I should not bother it, But I'm curious how do they done it?
There is some explanation in UG482.
Also about scrambling and unscrambling, any Idea or algorithm about that too, do we have some verilog code about doing just that too, so I can understand it better.
Scrambling is specific to protocols, typically it's some sort of LFSR, so refer to a particular protocol's specification for more details. As for example, you can check out DisplayPort scrambler implementation, for example here:
https://github.com/hamsternz/DisplayPort_Verilog DisplayPort version 1.2 specification is freely available from VESA website. There is an example implementation of it in C in the spec, which might be easier for you to read and understand.
Is there any opensource code on how to do MGT's on low speed wire?
I don't know of any, but you can use SERDES inside 7 series devices to get a first level approximation. One major difference is that SERDES does not have clock recovery circuits, so you will have to pass the clock along (or use some sort of oversampling to recover data, there is an appnote from Xilinx on how to do that), but you can emulate pretty much anything else.
The thing about MGTs is that they have not only the maximum line rate, but also the minimum one, and they are typically specced at 500 Mbps, which is way too fast for affordable scopes. Which is why I recommend rolling out your own lower-speed emulations. If you mimic MGT's fabric interface with your emulation, porting it later over to MGTs will be much easier. Or you can do it all in simulation with MGTs right from the get go - as line speed does not matter for sims.