Perhaps the problem is that the MCU port is an inout, so it has an inherent tristate associated with it. (The bus can be 1, 0, or Z. This is fundamentally implemented as a 'output' bit, a 'tristate' bit and an 'input' bit.)
That tristate may or may not be accessible, depending on the hardware.
Not that familiar with Gowin architecture, but you can try to see if the MCU module exposes e.g. a gpio_in and gpio_out and gpio_tri set of ports (instead of the 'easier' gpio one which does that for you.) It might then be possible to get your signal into the port that way.
Another way would be to use a range select, so if you do, e.g.:
Gowin_EMPU_Top your_instance_name(
.sys_clk (m3_clk), //input sys_clk
.gpio (gpio_io), //inout [15:0] gpio
.reset_n (reset_n) //input reset_n
);
wire LED_gpio = <some toggling signal>;
wire [15:0] my_gpio = {gpio_io[15:7], LED_gpio, gpio_io[5:0]};
This may not synthesise if the FPGA resources do not exist to connect into that port. For instance, on Xilinx Zynq most of the MIO are bonded directly to pads. There is no way to redirect these even if they exist as ports on the Verilog toplevel for the ARM, because there is no FPGA resource there. You will usually fail at place and route stage if you have done something wrong here.