The trick with doubling the size of the multiplier to do signed multiplication is only interesting if the hardware multipliers in the FPGA can only do unsigned operations. In that case it is often the best solution for latency. As for the size, it does use a lot less logic around the hardware multipliers than other algorithms, but of course requires bigger multipliers, or combining of several multipliers.
Converting a signed vector to absolute value before multiplying it and then negate the value again when needed usually adds a lot of latency and will create a less optimal module. It's only advantage is that it uses smaller multipliers. Only use it if you are really limited in how many multipliers you can use in the FPGA.
But AFAIK the hardware multipliers nowadays in FPGAs can do both signed and unsigned operations, so those kind of tricks aren't really needed. Those in Altera FPGAs need to be configured in one or the other mode at compile time, contrary to the FPGAs used by the OP. If you need a module that can be dynamically configured and changed between unsigned and signed, as for example in an ALU, the n+1 trick can be useful. That said if you have enough logic resources and are looking for the lowest latency, instantiating both an unsigned and a signed multiplier and muxing between the two could generate faster logic.
I used a lot "usually" in here, as your mileage may vary, depending on the synthesis tool, and how close your signal sizes are to the limits of the hardware multiplier blocks. If one of those algorithms make the size go one bit over the maximum size of the hardware multiplier, it can force the synthesizer to add lots of logic, changing the results.
So if you want to make your design optimal, test different solutions and see what works best. And of course start by deciding if you want to optimize for logic resource usage or for speed, as it may lead to different solutions.