FPGA's building blocks are not gates, but LUTs. Therefore, your primitives do not map well to real FPGA hardware, hence the failure.
FPGA usually have something special to implement addition. For example Xilinx's logic cells have built-in carry chain logic. If you use addition in Verilog, that's what are you going to get (in most cases anyway). It has built-in carry-in and carry-out wires. You can instantiate it directly. In most cases, it'll be the same as if you used Verilog addition. So, there's no reason not to use Verilog addition, unless you want to make things faster.
If you want to make things faster, you ought to know the underlying hardware and work with it manually. In FPGA, there's no access to individual gates, so that's wouldn't be something you want to use. Instead you need to study the structure of the FPGA and find hardware-specific methods. It might be hard to jump over FPGA built-in addition methods, but usually you can do something at the expense of using much more logic. For example, you can split your number in halves and calculate carry for each half independently. Say, for a 32-bit adder you use two 16-bit halves. Of course, you have to feed the carry from the bottom half to the upper half. Therefore, to parallel the processes, you'd need two adders for the upper half - one for '0' carry in and another for '1' carry in. By the time the real carry is ready, both adders are done and you now can use the real carry in to mux out the correct result. This will be faster, but this is more work, and use massively more logic than a simple adder.
I would be surprised if I found out that the adders in the ALU of modern Intel processors are written as a Verilog addition, but who knows.