Would you be kind enough and post the table for JAMCRC?
Like oPossum wrote, that table is in their first LUT post, as well as in mine.
The poly 0x04C11DB7 (0b00000100110000010001110110110111) in inverse bit order yields mask 0xedb88320 (0b11101101101110001000001100100000).
Like the comment says, the five mask/poly lines just invert the bit order.
The function takes a pointer to the data, the number of bytes in the data to be checksummed, and the initial checksum; and returns the updated checksum.
The per-byte lookup is faster than the per-32bit, because the per-32bit iterates over bits, not bytes.
To get any faster, one would have to write an even tighter inner loop, which I'm not sure is possible (since GCC -O2 on Cortex-M4 gets it down to four instructions plus branch); or switch to a HMAC (hash-based message authentication code) using a
hash function that is faster than four iterations of CRC32.
In this case, I suspect that MurmurHash, specifically
Murmur3_32, would perform well. This is because Cortex-M4 has a binary rotate right assembly instruction (rors) to implement ROL, and (hopefully!) a single cycle unsigned integer multiplication (that yields the low 32 bits of 32×32 multiplication).