Edge or level triggered doesn't matter. As long as a single step is made anytime after the step goes high. (That is still a single step even if the step is held high.)
What is important is the direction and that there are 16 steps for a full rotation.
When my DDR3 controller applies a step, it waits for ~ 1us for the PLL output to adapt, hence I basically ignore the 'phase_done' signal by just waiting a crap load of time.
As for the clock out, the phases once set better stay where they belong, even in simulation. Otherwise, the sim will fail over time.
I am not sure how your rPLL clock output cannot be exactly 400MHz if you set your reference clock divider and multiplier correctly. For example, when I generate the requested:
parameter int CLK_KHZ_IN = 50000, // PLL source input clock frequency in KHz.
parameter int CLK_IN_MULT = 32, // Multiply factor to generate the DDR MTPS speed divided by 2.
parameter int CLK_IN_DIV = 4, // Divide factor. When CLK_KHZ_IN is 25000,50000,75000,100000,125000,150000, use 2,4,6,8,10,12.
and synth the clock:
localparam period = 500000000/CLK_KHZ_IN ;
always #period CLK_IN = !CLK_IN; // create source clock oscillator
All the factors in the equations and delays hit dead on whole numbers.
In modelsim under menu 'wave/wave preferences / Grid & Timescale', if I set the grid preiod to a manual 2500ps and zoom in & scroll in the waveform output, you can see the 400MHz stays locked to the 50MHz source.
Do not worry about the initial PLL setup time. I wait plenty of time for the PLL and other stuff to synchronize before running the system. Also verify that Gowin provides a PLL locked signal out. My DDR3 is held in reset during power-up until the locked signal is ready, then, there is a ton of other delays to accommodate the DDR3 startup sequence.