Xilinx doesn't say how they build up the 18x18 multiplier in the Spartan 3 but it may very well be a Wallace tree because it is unclocked. The Wallace tree is just a big fur ball of logic.
http://www.xilinx.com/support/documentation/application_notes/xapp467.pdfIn my case, I was only interested in 16 bit 2's complement arithmetic because the computer I was emulating used only that format.
As to division, there are 3 broad approaches: Shift and Add/Subtract, High Radix and Convergence. Of the Shift and Add/Subtract type, non-restoring is probably a bit faster but if it is, it will only be by a cycle of two. Both obtain k bits in k cycles plus something for overhead and this is where things will vary.
High Radix is trying to do multi-bit division that I haven't thought much about.
Look for SRT algorithms because they try to do multi-bit division using a lookup table.
Convergence uses an iterative Newton-Raphson scheme (or similar) to converge on a solution. It will use a lot less cycles on a large word size. One application is when instead of dividing as n/d, they first find 1/d and then multiply by n. Clearly, that gets into some kind of fractional notation.
There are MANY discussions on the Internet and a couple of books seem to be useful:
"Computer Arithmetic Algorithms and Hardware Design" - Behrooz Parhami
http://www.alibris.com/Computer-Arithmetic-Algorithms-and-Hardware-Designs-Behrooz-Parhami/book/8234663?matches=42I don't think I would buy the $600+ offering. They like that book more than I do!
"Synthesis or Arithmetic Circuits" - Deschamps, Bioul, Sutter
http://www.alibris.com/Synthesis-of-Arithmetic-Circuits-FPGA-ASIC-and-Embedded-Systems-Jean-Pierre-DesChamps/book/28466467?matches=16A little more reasonable! This book has a lot of pseudo code that makes the algorithm more clear.
There are many other books on the subject.
The thing about division is that it is seldom used. It is worth spending a little time researching solutions but, in the end, non-restoring 2's complement gets it done. Especially for short word lengths. For a 64 bit machine, maybe the Newton-Raphson approach is a lot better.