#1: Yes, the PLL needs to generate the DDR_CK frequency. I don't know about Xilinx, but with Altera, running a DDR buffer at 303MHz means the data come in/out at 606MHz, bu on the internal bus side, the data runs synchronous to 303MHz, but it's 2x wide.
#2: A multicycle in the .sdc allows you to describe how many clocks of slack is allowed to transition from source to destination before the data must be valid. As a 'falsepath' means you don't care about the timing. This allows the compiler's fitter to optimize logic placement which can allow it to focus on parts of the design which must be fast and ignore to paths you choose, IE a global slow reset control, which you can specify it may be allowed to take up to 2/3 or more clocks before it reaches it's end.
I use the '_toggle' signal when my data out is ready allowing me to run my system clock at 150.15MHz (half rate mode), where it monitors for a toggle on that line, and when it does, it know a read data came in and latches it. If I used a single 'data_ready' pulse, IE high for 1 single 303MHz clock cycle, the 150m side, or even 75m (Quarter rate) side would never see that portion of a clock pulse. On the read clock side, serial delaying the '_toggle' by a clock or 2 allows me to set such a multicycle in the .sdc further removing the stringent timing between the DDR read clock and the rest of the system since the RDATA would have the data change to it's new contents 1 clock before, and hold it's contents steady for 3 clocks after the '_toggle' has toggled.
#3: BC4 read 2 clocks, plus a 1 clock break before the next BC4 or BL8 is permitted. So, a BC4 after a BC4 has a 3 CK cycle. BL8s are permitted every 4 clocks, IE a 4 clock cycle. That's 1 clock shorter timing. If you are not making a smart ram controller, or no consecutive bursting capabilities, your command spacing will never have these adjacent bursts, so you never need to worry about such tight cycles.