for larger ones, I think the size of the required look-up table would quickly get impractical. Ideas?
I experimented with some approaches, but for larger ones, a binary shift-and-add ("barrel multiplier"?) wins. (It is difficult to keep
N×
N-bit product with 2
N-bit result under 2
N ops.) So, this only makes sense if you have
Si precalculated -- actually,
Si = floor(abs(
i)
2/4) for
i = 1-2
N .. 2
N+1-1, so that for 0≤
a,
b≤2
N-1,
a×
b becomes
Sa+b -
Sa-b. That makes the lookup table size just under 6
N 2
N bits, turning the multiplication into two lookups, two subtractions, and one addition. I do not know how expensive such ROM tables are in general.
For example, if one has only 8x8 to 16-bit multiplication, the naïve/direct method is to compute 32×32 to 64-bit product using (
a0 + 2
8a1 + 2
16a2 + 2
24a3)(
b0 + 2
8b1 + 2
16b2 + 2
24b3) = 2
0 a0 b0 + 2
8 a0 b1 + 2
16 a0 b2 + 2
24 a0 b3 + 2
8 a1 b0 + 2
16 a1 b1 + 2
24 a1 b2 + 2
32 a1 b3 + 2
16 a2 b0 + 2
24 a2 b1 + 2
32 a2 b2 + 2
40 a2 b3 + 2
24 a3 b0 + 2
32 a3 b1 + 2
40 a3 b2 + 2
48 a3 b3.
That is, with
K words to construct each
N-bit unsigned integer value, you have
K2 terms. However, a barrel multiplier can do it in O(
KN), using just one-bit shifts (through carry), and additions; in hardware, several of them can be done in parallel. (For
K=4, a 32×32=64-bit multiplication can be done in seven 64-bit additions, where each summand is constructed of at most three multiplications affecting non-overlapping bits.)
Anyway, knowing such mathematical tricks exist, may be useful for some odd purposes. Neat, in my opinion.