EEVblog Electronics Community Forum
Products => Computers => Embedded Computing => Topic started by: peter-h on July 22, 2021, 09:32:01 pm
-
I am programming a 32F417 which has a hardware CRC calculator, but the STM HAL function for it is too convoluted for me to understand. But also I need a version to run "on the other end" which doesn't have a CRC calculator.
So I am looking for a software algorithm. It does not need to be identical to the STM hardware one, which may be this one
https://stm32f4-discovery.net/2015/07/hal-library-10-crc-for-stm32fxxx/ (https://stm32f4-discovery.net/2015/07/hal-library-10-crc-for-stm32fxxx/) or this one
https://www.st.com/resource/en/application_note/dm00068118-using-the-crc-peripheral-in-the-stm32-family-stmicroelectronics.pdf (https://www.st.com/resource/en/application_note/dm00068118-using-the-crc-peripheral-in-the-stm32-family-stmicroelectronics.pdf)
It doesn't need to be fast. I would like > 1MB/sec but even one which doesn't use a table should do that (168MHz CPU with a barrel shifter). It needs to take in 1 byte at a time, not a pointer to a buffer and a length (because I will be calculating it over varying amounts of data, and too much to fit into RAM in one go).
But I am finding various algorithms which seem to be different... I've been told that the 32 bit CRC has been standardised.
For example I found this one:
https://stackoverflow.com/questions/21001659/crc32-algorithm-implementation-in-c-without-a-look-up-table-and-with-a-public-li (https://stackoverflow.com/questions/21001659/crc32-algorithm-implementation-in-c-without-a-look-up-table-and-with-a-public-li)
which is trivially modified for a byte at a time:
static uint32_t crc = 0xFFFFFFFF; // this value gets the accumulated CRC as you go along
void crc32b(uint8_t input_byte) {
uint32_t mask, j;
crc = crc ^ input_byte;
for (j = 7; j >= 0; j--)
{
mask = -(crc & 1);
crc = (crc >> 1) ^ (0xEDB88320 & mask);
}
}
Thank you in advance for any ideas. I have tons of CRC-16 code and have used these many times, some with big tables and pretty fast, but not CRC-32.
-
There are numerous standards that use a CRC. Some share the same polynomial, initial value and bit order.
List of some common ones here: https://en.wikipedia.org/wiki/Cyclic_redundancy_check#Polynomial_representations_of_cyclic_redundancy_checks
The results for a given CRC (such as those listed) will be the same no matter how it is calculated.
-
OK; thanks. The CRC-32 in that list is probably the one in my code in the 1st post:
(https://peter-ftp.co.uk/screenshots/202107220712524322.jpg)
-
CRC32 is a CRC on 32 bits. There are many possible polynomials. Refer to the Wikipedia article for the "standard" ones. There are several.
Two things to note: apart from the polynomial, of course you can select the start value. One thing that can vary as well is that the resulting CRC can be negated or not (logical NOT). If you're using a CRC that you control on both sides, picking all 1's for the start value and not negating the result is fine.
The second point is that you can use a pre-built array of 256 32-bit values that can replace the inner loop of 8 iterations. Usually faster (but depending on architecture.)
You can have a look at GCC's implementation to get some ideas: https://github.com/gcc-mirror/gcc/blob/master/libiberty/crc32.c
-
That's a nice algorithm. It looks exactly the same CRC32 as my original code, however.
Clearly there are far fewer variations in the 32 bit CRC spectrum. Almost zero, actually :) This seems to be the "Ethernet" CRC, polynomial 0x04c11db7. And it is the same as the STC 32F4 hardware version
(https://peter-ftp.co.uk/screenshots/20210723443500314.jpg)
One thing which may vary is the byte order stored in the file, or sent in the serial data stream. For example Modbus swaps the two bytes in the CRC-16.
Interestingly this site https://www.lammertbies.nl/comm/info/crc-calculation (https://www.lammertbies.nl/comm/info/crc-calculation) lists just one CRC-32.
-
Calculating CRC isn't computationally that heavy, even without optimizations. Software implementation is so simple that you can remember it by heart, or just copy paste from an earlier design, or from online. Avoid premature optimization. I have never used the hardware CRC block. It is questionable how much it helps, if any, having to transfer all that data through the bus to the peripheral for a few simple operations.
Use some of the widely used polynomials, and don't forget you need a specific reset value to be compatible with others. If you have control over both sides, resetting to all ones, and using any of the popular polynomials is fine.
-
In their document ST claim a huge speedup, but that involves feeding their CRC generator with DMA.
In my case I am reading the data (in most cases) from a 21mbps SPI FLASH which limits it to about 2MB/sec anyway. One could do the CRC in parallel with waiting for the SPI but the ST SPI libs are blocking AFAICT (they are incredibly bloated).
-
This is less trivial than I thought.
Start here https://www.lammertbies.nl/comm/info/crc-calculation (https://www.lammertbies.nl/comm/info/crc-calculation)
(https://peter-ftp.co.uk/screenshots/202107254112553316.jpg)
The CBF43926 appears the right result, looking in other places e.g. here https://crccalc.com/ (https://crccalc.com/)
(https://peter-ftp.co.uk/screenshots/202107255912563416.jpg)
But... what code produces this result?
Not this one
// CRC-32 - Ethernet version. Polynomial is 0x04c11db7 and it holds it backwards (0xEDB88320).
// This accepts one byte at a time, and returns the CRC in 'g_crc32value'.
// This makes it suitable for calculating a CRC across a joined-up combination of data blocks.
// this value holds the accumulated CRC as you go along - init'd to 0xffffffff by caller
uint32_t g_crc32value;
void crc32(uint8_t input_byte)
{
g_crc32value = g_crc32value ^ input_byte;
for (uint32_t j=0; j<8; j++)
{
uint32_t mask = -(g_crc32value & 1);
g_crc32value = (g_crc32value >> 1) ^ (0xEDB88320 & mask);
}
}
which would be the simplest, if not the fastest. All the fast ones use a table, but every one I tried produces a different value.
Edit: I found it in the end. Well, I never found one for CRC-32 which produces CBF43926. I found some which produce the bit inverse of CBF43926 :) Here is my final one:
/*
*
* CRC-32/JAMCRC.
* Invert the *final* result and you get the more standard CRC-32/ISO-HDLC.
* Returns 0x340BC6D9 or (if final result inverted) 0xcbf43926 from "123456789" (9 bytes).
* Polynomial is 0x04c11db7 and it holds it backwards (0xEDB88320).
* This version accepts one byte at a time, and maintains the CRC in crcvalue. This makes it suitable
* for calculating a CRC across a number of data blocks.
* crcvalue must be initialised to 0xffffffff by caller, and holds the accumulated CRC as you go along
* See e.g. [url]https://crccalc.com/[/url] [url]https://www.lammertbies.nl/comm/info/crc-calculation[/url]
* [url]https://reveng.sourceforge.io/crc-catalogue/17plus.htm#crc.cat-bits.32[/url]
*
*/
void crc32(uint8_t input_byte, uint32_t *crcvalue)
{
for (uint32_t j=0; j<8; j++)
{
uint32_t mask = (input_byte^*crcvalue) & 1;
*crcvalue>>=1;
if(mask)
*crcvalue=*crcvalue^0xEDB88320;
input_byte>>=1;
}
}
-
static uint32_t crc;
void update_crc(unsigned const x)
{
crc ^= (x & 0xFF);
unsigned bits = 8;
do {
if (crc & 1) {
crc >>= 1; crc ^= 0xEDB88320;
} else {
crc >>= 1;
}
} while (--bits);
}
int main(void)
{
crc = ~0;
char const* test = "123456789";
while (char const c = *test++) update_crc(unsigned(c));
printf("%08X\n", crc ^ ~0);
}
This will usually be a little faster...
void update_crc(unsigned x)
{
x ^= crc;
crc >>= 8;
if (x & 0x01) crc ^= 0x77073096;
if (x & 0x02) crc ^= 0xEE0E612C;
if (x & 0x04) crc ^= 0x076DC419;
if (x & 0x08) crc ^= 0x0EDB8832;
if (x & 0x10) crc ^= 0x1DB71064;
if (x & 0x20) crc ^= 0x3B6E20C8;
if (x & 0x40) crc ^= 0x76DC4190;
if (x & 0x80) crc ^= 0xEDB88320;
}
-
Interesting that you are inverting the result too
printf("%08X\n", crc ^ ~0);
I never found any code which delivered 0xCBF43926 directly, FWIW.
And I always learn something here, like real coders write
crc = ~0;
instead of
crc = 0xffffffff;
:)
Incidentally I got no output with GCC (Cube IDE) unless I used %08lx.
-
If you control the code on both ends and don’t necessarily need a true CRC but just some kind of error checking, you may wish to consider Adler-32 (https://en.m.wikipedia.org/wiki/Adler-32) rather than CRC-32. It was developed by Mark Adler in 1995 as a replacement for CRCs and is the algorithm that the zlib compression library and rsync use. It is much faster and simpler than computing a CRC, with a small tradeoff in reliability.
You could simply copy the zlib implementation of Adler-32 (https://github.com/madler/zlib/blob/master/adler32.c) and be done, otherwise the Wikipedia article includes a naïve implementation that’s a bit slower, but very simple:
const uint32_t MOD_ADLER = 65521;
uint32_t adler32(unsigned char *data, size_t len)
/*
where data is the location of the data in physical memory and
len is the length of the data in bytes
*/
{
uint32_t a = 1, b = 0;
size_t index;
// Process each byte of the data in order
for (index = 0; index < len; ++index)
{
a = (a + data[index]) % MOD_ADLER;
b = (b + a) % MOD_ADLER;
}
return (b << 16) | a;
}
For a rolling implementation, see this answer from StackOverflow (https://stackoverflow.com/a/8649139).
-
I never found any code which delivered 0xCBF43926 directly, FWIW.
That specific CRC requires the result to be inverted (XorOut column in the table). There is no other way.
-
LUT version...
void update_crc(unsigned x)
{
// 0x04C11DB7 reflected
static uint32_t const crc_tab[1 << 8] = {
0x00000000, 0x77073096, 0xEE0E612C, 0x990951BA, 0x076DC419, 0x706AF48F, 0xE963A535, 0x9E6495A3,
0x0EDB8832, 0x79DCB8A4, 0xE0D5E91E, 0x97D2D988, 0x09B64C2B, 0x7EB17CBD, 0xE7B82D07, 0x90BF1D91,
0x1DB71064, 0x6AB020F2, 0xF3B97148, 0x84BE41DE, 0x1ADAD47D, 0x6DDDE4EB, 0xF4D4B551, 0x83D385C7,
0x136C9856, 0x646BA8C0, 0xFD62F97A, 0x8A65C9EC, 0x14015C4F, 0x63066CD9, 0xFA0F3D63, 0x8D080DF5,
0x3B6E20C8, 0x4C69105E, 0xD56041E4, 0xA2677172, 0x3C03E4D1, 0x4B04D447, 0xD20D85FD, 0xA50AB56B,
0x35B5A8FA, 0x42B2986C, 0xDBBBC9D6, 0xACBCF940, 0x32D86CE3, 0x45DF5C75, 0xDCD60DCF, 0xABD13D59,
0x26D930AC, 0x51DE003A, 0xC8D75180, 0xBFD06116, 0x21B4F4B5, 0x56B3C423, 0xCFBA9599, 0xB8BDA50F,
0x2802B89E, 0x5F058808, 0xC60CD9B2, 0xB10BE924, 0x2F6F7C87, 0x58684C11, 0xC1611DAB, 0xB6662D3D,
0x76DC4190, 0x01DB7106, 0x98D220BC, 0xEFD5102A, 0x71B18589, 0x06B6B51F, 0x9FBFE4A5, 0xE8B8D433,
0x7807C9A2, 0x0F00F934, 0x9609A88E, 0xE10E9818, 0x7F6A0DBB, 0x086D3D2D, 0x91646C97, 0xE6635C01,
0x6B6B51F4, 0x1C6C6162, 0x856530D8, 0xF262004E, 0x6C0695ED, 0x1B01A57B, 0x8208F4C1, 0xF50FC457,
0x65B0D9C6, 0x12B7E950, 0x8BBEB8EA, 0xFCB9887C, 0x62DD1DDF, 0x15DA2D49, 0x8CD37CF3, 0xFBD44C65,
0x4DB26158, 0x3AB551CE, 0xA3BC0074, 0xD4BB30E2, 0x4ADFA541, 0x3DD895D7, 0xA4D1C46D, 0xD3D6F4FB,
0x4369E96A, 0x346ED9FC, 0xAD678846, 0xDA60B8D0, 0x44042D73, 0x33031DE5, 0xAA0A4C5F, 0xDD0D7CC9,
0x5005713C, 0x270241AA, 0xBE0B1010, 0xC90C2086, 0x5768B525, 0x206F85B3, 0xB966D409, 0xCE61E49F,
0x5EDEF90E, 0x29D9C998, 0xB0D09822, 0xC7D7A8B4, 0x59B33D17, 0x2EB40D81, 0xB7BD5C3B, 0xC0BA6CAD,
0xEDB88320, 0x9ABFB3B6, 0x03B6E20C, 0x74B1D29A, 0xEAD54739, 0x9DD277AF, 0x04DB2615, 0x73DC1683,
0xE3630B12, 0x94643B84, 0x0D6D6A3E, 0x7A6A5AA8, 0xE40ECF0B, 0x9309FF9D, 0x0A00AE27, 0x7D079EB1,
0xF00F9344, 0x8708A3D2, 0x1E01F268, 0x6906C2FE, 0xF762575D, 0x806567CB, 0x196C3671, 0x6E6B06E7,
0xFED41B76, 0x89D32BE0, 0x10DA7A5A, 0x67DD4ACC, 0xF9B9DF6F, 0x8EBEEFF9, 0x17B7BE43, 0x60B08ED5,
0xD6D6A3E8, 0xA1D1937E, 0x38D8C2C4, 0x4FDFF252, 0xD1BB67F1, 0xA6BC5767, 0x3FB506DD, 0x48B2364B,
0xD80D2BDA, 0xAF0A1B4C, 0x36034AF6, 0x41047A60, 0xDF60EFC3, 0xA867DF55, 0x316E8EEF, 0x4669BE79,
0xCB61B38C, 0xBC66831A, 0x256FD2A0, 0x5268E236, 0xCC0C7795, 0xBB0B4703, 0x220216B9, 0x5505262F,
0xC5BA3BBE, 0xB2BD0B28, 0x2BB45A92, 0x5CB36A04, 0xC2D7FFA7, 0xB5D0CF31, 0x2CD99E8B, 0x5BDEAE1D,
0x9B64C2B0, 0xEC63F226, 0x756AA39C, 0x026D930A, 0x9C0906A9, 0xEB0E363F, 0x72076785, 0x05005713,
0x95BF4A82, 0xE2B87A14, 0x7BB12BAE, 0x0CB61B38, 0x92D28E9B, 0xE5D5BE0D, 0x7CDCEFB7, 0x0BDBDF21,
0x86D3D2D4, 0xF1D4E242, 0x68DDB3F8, 0x1FDA836E, 0x81BE16CD, 0xF6B9265B, 0x6FB077E1, 0x18B74777,
0x88085AE6, 0xFF0F6A70, 0x66063BCA, 0x11010B5C, 0x8F659EFF, 0xF862AE69, 0x616BFFD3, 0x166CCF45,
0xA00AE278, 0xD70DD2EE, 0x4E048354, 0x3903B3C2, 0xA7672661, 0xD06016F7, 0x4969474D, 0x3E6E77DB,
0xAED16A4A, 0xD9D65ADC, 0x40DF0B66, 0x37D83BF0, 0xA9BCAE53, 0xDEBB9EC5, 0x47B2CF7F, 0x30B5FFE9,
0xBDBDF21C, 0xCABAC28A, 0x53B39330, 0x24B4A3A6, 0xBAD03605, 0xCDD70693, 0x54DE5729, 0x23D967BF,
0xB3667A2E, 0xC4614AB8, 0x5D681B02, 0x2A6F2B94, 0xB40BBE37, 0xC30C8EA1, 0x5A05DF1B, 0x2D02EF8D
};
crc = (crc >> 8) ^ crc_tab[(x ^ crc) & 0xFF];
}
-
But... what code produces this result?
Not this one
// CRC-32 - Ethernet version. Polynomial is 0x04c11db7 and it holds it backwards (0xEDB88320).
// This accepts one byte at a time, and returns the CRC in 'g_crc32value'.
// This makes it suitable for calculating a CRC across a joined-up combination of data blocks.
// this value holds the accumulated CRC as you go along - init'd to 0xffffffff by caller
uint32_t g_crc32value;
void crc32(uint8_t input_byte)
{
g_crc32value = g_crc32value ^ input_byte;
for (uint32_t j=0; j<8; j++)
{
uint32_t mask = -(g_crc32value & 1);
g_crc32value = (g_crc32value >> 1) ^ (0xEDB88320 & mask);
}
}
That is a clever way to eliminate a conditional jump in the loop. It can be simplified to this...
void update_crc(unsigned const x) { crc ^= (x & 0xFF); unsigned b = 8; do crc = ((0 - (crc & 1)) & 0xEDB88320) ^ (crc >> 1); while (--b); }
-
That CRC did not however produce any "standard" CRC result that I could find.
I went around this loop recently with 16 bit CRCs. Many years ago I developed some products which used CRCs for comms, and this time I tried to replicate them in C. None of the CRC algorithms I found matched the result of the old ones (which were written in assembler; Z180 and H8/300) and I never bothered to convert the obscure asm code (which I got from some book; I didn't write it) into C.
I suspect this goes on a lot i.e. people are controlling both ends of the link so they never find out.
But also a lot of stuff posted online is garbage :) One algorithm I found was claimed to deliver the 0xcbf43926 (if you google on that value you find loads of hits) but it definitely does not. I do not understand why anybody would do that, but they do...
This is my eventual version of a "rolling" CRC32 which gives you "CRC32/JAMCRC" and if you invert the final result then you get "ISO/HDLC" which I believe is used with Ethernet.
I never found a usable C source for the Ethernet one, despite loads of google hits and lots of people looking for it, although to be fair I was looking for a non-table version because I am using this in a boot loader, don't want to just waste 1k, and don't need the max speed. For all I know it may be this one.
/*
*
* CRC-32/JAMCRC.
* Invert the *final* result and you get the more standard CRC-32/ISO-HDLC.
* Returns 0x340BC6D9 or (if final result inverted) 0xcbf43926 from "123456789" (9 bytes).
* Polynomial is 0x04c11db7 and it holds it backwards (0xEDB88320).
* This version accepts one byte at a time, and maintains the CRC in crcvalue. This makes it suitable
* for calculating a CRC across a number of data blocks.
* crcvalue must be initialised to 0xffffffff by caller, and holds the accumulated CRC as you go along
* See e.g. [url]https://crccalc.com/[/url] [url]https://www.lammertbies.nl/comm/info/crc-calculation[/url]
* [url]https://reveng.sourceforge.io/crc-catalogue/17plus.htm#crc.cat-bits.32[/url]
*
*/
void crc32(uint8_t input_byte, uint32_t *crcvalue)
{
for (uint32_t j=0; j<8; j++)
{
uint32_t mask = (input_byte^*crcvalue) & 1;
*crcvalue>>=1;
if(mask)
*crcvalue=*crcvalue^0xEDB88320;
input_byte>>=1;
}
}
I am using this site, among others: https://www.lammertbies.nl/comm/info/crc-calculation (https://www.lammertbies.nl/comm/info/crc-calculation)
// TEST CODE FOR CRC-32
uint8_t crctest[] = { "123456789" };
uint32_t crcvalue = 0xffffffff;
for (int i=0; i<9; i++)
{
crc32(crctest[i], &crcvalue);
}
printf("crc32 = %08lx, %08lx", crcvalue, ~crcvalue);
-
Getting the wrong results is usually just a case of wrong bit order (reflection), initial value or final XOR.
Here is a test case for both websites.
static uint32_t crc;
void update_crc(uint32_t const x, unsigned bits, uint32_t const poly, unsigned const reflect)
{
if (reflect) {
// Reflected
crc ^= (x & (0xFFFFFFFF >> (32 - bits)));
do {
#if 0
if (crc & 1) {
crc >>= 1; crc ^= poly;
} else {
crc >>= 1;
}
#else
crc = ((0 - (crc & 1)) & poly) ^ (crc >> 1);
#endif
} while (--bits);
} else {
// Not reflected
crc ^= (x << (32 - bits));
do {
if (crc & 0x80000000) {
crc <<= 1; crc ^= poly;
} else {
crc <<= 1;
}
} while (--bits);
}
}
void test_text(char const * const text, uint32_t const poly, uint32_t const init, uint32_t const out, unsigned const reflect)
{
crc = init;
char const* s = text;
while (char const c = *s++) {
update_crc(unsigned(c), 8, poly, reflect);
}
printf("0x%08X\n", crc ^ out);
}
void test_text_all(char const* const text)
{
#if 1
// 16 bit
printf("www.lammertbies.nl/comm/info/crc-calculation\n");
test_text(text, 0x0000A001, 0, 0, 1); // 0x8005 reflected
test_text(text, 0x0000A001, 0x0000FFFF, 0, 1); // 0x8005 reflected
test_text(text, 0x10210000, 0xFFFF0000, 0, 0); // lammertbies.nl is incorrect [url]https://cdn.sick.com/media/docs/6/46/846/technical_information_beam_data_for_diagnostics_and_automation_en_im0060846.pdf[/url]
test_text(text, 0x10210000, 0, 0, 0);
test_text(text, 0x10210000, 0xFFFF0000, 0, 0);
test_text(text, 0x10210000, 0x1D0F0000, 0, 0);
test_text(text, 0x00008408, 0, 0, 1); // 0x1021 reflected
test_text(text, 0x0000A6BC, 0, 0x0000FFFF, 1); // 0x3D65 reflected
test_text(text, 0xEDB88320, ~0, ~0, 1); // 0x04C11DB7 reflected
#endif
#if 1
// 8 bit
printf("crccalc.com 8 bit\n");
test_text(text, 0x07000000, 0, 0, 0);
test_text(text, 0x9B000000, 0xFF000000, 0, 0);
test_text(text, 0x0000009C, 0, 0, 1); // 0x39 reflected
test_text(text, 0xD5000000, 0, 0, 0);
test_text(text, 0x000000B8, 0x000000FF, 0, 1); // 0x1D reflected
test_text(text, 0x1D000000, 0xFD000000, 0, 0);
test_text(text, 0x07000000, 0, 0x55000000, 0);
test_text(text, 0x0000008C, 0, 0, 1); // 0x31 reflected
test_text(text, 0x000000E0, 0x000000FF, 0, 1); // 0x07 reflected
test_text(text, 0x000000D9, 0, 0, 1); // 0x9B reflected
#endif
#if 1
// 16 bit
printf("crccalc.com 16 bit\n");
test_text(text, 0x10210000, 0xFFFF0000, 0, 0);
test_text(text, 0x0000A001, 0, 0, 1); // 0x8005 reflected
test_text(text, 0x10210000, 0x1D0F0000, 0, 0);
test_text(text, 0x80050000, 0, 0, 0);
test_text(text, 0xC8670000, 0xFFFF0000, 0, 0);
test_text(text, 0x80050000, 0x800D0000, 0, 0);
test_text(text, 0x05890000, 0, 0x00010000, 0);
test_text(text, 0x05890000, 0, 0, 0);
test_text(text, 0x0000A6BC, 0, 0x0000FFFF, 1); // 0x3D65 reflected
test_text(text, 0x3D650000, 0, 0xFFFF0000, 0);
test_text(text, 0x10210000, 0xFFFF0000, 0xFFFF0000, 0);
test_text(text, 0x0000A001, 0, 0x0000FFFF, 1); // 0x8005 reflected
test_text(text, 0x00008408, 0x0000FFFF, 0, 1); // 0x1021 reflected
test_text(text, 0x00008408, 0x0000554D, 0, 1); // 0x1021 reflected
test_text(text, 0x8BB70000, 0, 0, 0);
test_text(text, 0xA0970000, 0, 0, 0);
test_text(text, 0x00008408, 0x00003791, 0, 1); // 0x1021 reflected
test_text(text, 0x0000A001, 0x0000FFFF, 0x0000FFFF, 1); // 0x8005 reflected
test_text(text, 0x00008408, 0x00006363, 0, 1); // 0x1021 reflected
test_text(text, 0x00008408, 0, 0, 1); // 0x1021 reflected
test_text(text, 0x0000A001, 0x0000FFFF, 0, 1); // 0x8005 reflected
test_text(text, 0x00008408, 0x0000FFFF, 0x0000FFFF, 1); // 0x1021 reflected
test_text(text, 0x10210000, 0, 0, 0);
#endif
#if 1
// 32 bit
printf("crccalc.com 32 bit\n");
test_text(text, 0xEDB88320, ~0, ~0, 1); // 0x04C11DB7 reflected
test_text(text, 0x04C11DB7, ~0, ~0, 0);
test_text(text, 0x82F63B78, ~0, ~0, 1); // 0x1EDC6F41 reflected
test_text(text, 0xD419CC15, ~0, ~0, 1); // 0xA833982B reflected
test_text(text, 0x04C11DB7, ~0, 0, 0);
test_text(text, 0x04C11DB7, 0, ~0, 0);
test_text(text, 0x814141AB, 0, 0, 0);
test_text(text, 0xEDB88320, ~0, 0, 1); // 0x04C11DB7 reflected
test_text(text, 0x000000AF, 0, 0, 0);
#endif
}
int main()
{
test_text_all("123456789");
}
-
Nice; thanks.
I've been timing mine just now, to see if it needs the table method, and it does the entire 32F417 FLASH (1MB) in 2 seconds. 168MHz.
Might try speeding it up a little because this test will be done at every power-up.
-
Using 32 bit words instead of 8 bit bytes will probably be faster.
void update_crc(uint32_t const x)
{
crc ^= x;
unsigned b = 32; do crc = ((0 - (crc & 1)) & 0xEDB88320) ^ (crc >> 1); while (--b);
}
void update_crc(uint32_t x)
{
x ^= crc
crc = 0
if(x & 0x00000001) crc ^= 0xB8BC6765
if(x & 0x00000002) crc ^= 0xAA09C88B
if(x & 0x00000004) crc ^= 0x8F629757
if(x & 0x00000008) crc ^= 0xC5B428EF
if(x & 0x00000010) crc ^= 0x5019579F
if(x & 0x00000020) crc ^= 0xA032AF3E
if(x & 0x00000040) crc ^= 0x9B14583D
if(x & 0x00000080) crc ^= 0xED59B63B
if(x & 0x00000100) crc ^= 0x01C26A37
if(x & 0x00000200) crc ^= 0x0384D46E
if(x & 0x00000400) crc ^= 0x0709A8DC
if(x & 0x00000800) crc ^= 0x0E1351B8
if(x & 0x00001000) crc ^= 0x1C26A370
if(x & 0x00002000) crc ^= 0x384D46E0
if(x & 0x00004000) crc ^= 0x709A8DC0
if(x & 0x00008000) crc ^= 0xE1351B80
if(x & 0x00010000) crc ^= 0x191B3141
if(x & 0x00020000) crc ^= 0x32366282
if(x & 0x00040000) crc ^= 0x646CC504
if(x & 0x00080000) crc ^= 0xC8D98A08
if(x & 0x00100000) crc ^= 0x4AC21251
if(x & 0x00200000) crc ^= 0x958424A2
if(x & 0x00400000) crc ^= 0xF0794F05
if(x & 0x00800000) crc ^= 0x3B83984B
if(x & 0x01000000) crc ^= 0x77073096
if(x & 0x02000000) crc ^= 0xEE0E612C
if(x & 0x04000000) crc ^= 0x076DC419
if(x & 0x08000000) crc ^= 0x0EDB8832
if(x & 0x10000000) crc ^= 0x1DB71064
if(x & 0x20000000) crc ^= 0x3B6E20C8
if(x & 0x40000000) crc ^= 0x76DC4190
if(x & 0x80000000) crc ^= 0xEDB88320
}
-
On Cortex-M4, I would recommend a variant of oPossum's code above,
static const uint32_t crc32_lookup[32] = {
0xB8BC6765, 0xAA09C88B, 0x8F629757, 0xC5B428EF,
0x5019579F, 0xA032AF3E, 0x9B14583D, 0xED59B63B,
0x01C26A37, 0x0384D46E, 0x0709A8DC, 0x0E1351B8,
0x1C26A370, 0x384D46E0, 0x709A8DC0, 0xE1351B80,
0x191B3141, 0x32366282, 0x646CC504, 0xC8D98A08,
0x4AC21251, 0x958424A2, 0xF0794F05, 0x3B83984B,
0x77073096, 0xEE0E612C, 0x076DC419, 0x0EDB8832,
0x1DB71064, 0x3B6E20C8, 0x76DC4190, 0xEDB88320,
};
static inline uint32_t crc32(uint32_t crc, uint32_t data)
{
unsigned int i = 0;
data ^= crc;
crc = 0;
while (data) {
crc ^= crc32_lookup[i++] & (~((data & 1) - 1));
data >>= 1;
}
return crc;
}
The crc32() takes the initial (or current) CRC and the 32-bit data word as parameters, and returns the updated CRC.
The expression (~((data & 1) - 1)) relies on data being an unsigned integer type, here uint32_t. When its least significant bit is zero, the expression is zero; when the least significant bit is set, the expression is ~(uint32_t)0 (or equivalently (uint32_t)(-1)).
On Cortex-M4, this should compile to an inner loop with five instructions and a branch, with very little setup overhead.
-
Won't that while(data) loop go around up to 32 times, and 16 times on average for random data?
There may also be cases where I need to do byte data.
Probably need to look at doing it with a table after all... but they are all 256 x uint32_t i.e. 1kbyte tables.
-
Won't that while(data) loop go around up to 32 times, and 16 times on average for random data?
31 times for uniform random data, yes. One inner loop iteration per bit. (Since each bit is set at 50% probability, the average loop length is just under 31 iterations.)
Using GCC -O2, it uses 34 32-bit words, or 136 bytes of ROM/Flash for the table. In comparison, the full byte table takes 256 32-bit words, or 1024 bytes of ROM/Flash:
static const uint32_t crc32_lookup[256] = {
0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, 0xe963a535, 0x9e6495a3,
0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91,
0x1db71064, 0x6ab020f2, 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, 0xfa0f3d63, 0x8d080df5,
0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b,
0x35b5a8fa, 0x42b2986c, 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, 0xcfba9599, 0xb8bda50f,
0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d,
0x76dc4190, 0x01db7106, 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, 0x91646c97, 0xe6635c01,
0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457,
0x65b0d9c6, 0x12b7e950, 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, 0xa4d1c46d, 0xd3d6f4fb,
0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9,
0x5005713c, 0x270241aa, 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, 0xb7bd5c3b, 0xc0ba6cad,
0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683,
0xe3630b12, 0x94643b84, 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, 0x196c3671, 0x6e6b06e7,
0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5,
0xd6d6a3e8, 0xa1d1937e, 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, 0x316e8eef, 0x4669be79,
0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f,
0xc5ba3bbe, 0xb2bd0b28, 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, 0x72076785, 0x05005713,
0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21,
0x86d3d2d4, 0xf1d4e242, 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, 0x616bffd3, 0x166ccf45,
0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db,
0xaed16a4a, 0xd9d65adc, 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, 0x54de5729, 0x23d967bf,
0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d,
};
static inline uint32_t crc32(const void *src, const size_t len, uint32_t crc)
{
const unsigned char *const end = (const unsigned char *)src + len;
const unsigned char *ptr = src;
while (ptr < end)
crc = (crc >> 8) ^ crc32_lookup[(*(ptr++) ^ crc) & 255];
return crc;
}
Using GCC-6.3.1, gcc -Wall -O2 -mcpu=cortex-m4 -S generates essentially
.text
.thumb
.thumb_func
crc32:
add r1, r1, r0
cmp r0, r1
bcs .L1
push {r4}
ldr r4, .arrayptr
mov r3, r0
mov r0, r2
.L2:
ldrb r2, [r3], #1
ldr r2, [r4, r2, lsl #2]
cmp r1, r3
eor r0, r0, r2
bne .L2
pop {r4}
bx lr
.L1:
mov r0, r2
bx lr
.align 2
.arrayptr:
.word crc32_lookup
.section .rodata
crc32_lookup:
.word 0
; omitted the 255 .word lines
which is extremely efficient (four instructions plus branch per byte); but again, takes more ROM/Flash due to the 1024-byte table.
Edited: There were a couple of bugs, now fixed. This yields the same CRCs as jamcrc, and is basically identical to oPossum's LUT version above.
To compute the table at runtime, one can use
for (int i = 0; i < 256; i++) {
uint8_t byte = i;
uint32_t val = 0;
for (uint_fast8_t bit = 0; bit < 8; bit++) {
val = (val >> 1) ^ ((uint32_t)(-((byte ^ val) & 1)) & 0xedb88320);
byte >>= 1;
}
crc32_lookup[i] = val;
}
-
while (ptr < end)
crc ^= crc32_lookup[*(ptr++)];
Does that really work?
I think is has to be...
while (ptr < end)
crc = (crc >> 8) ^ crc32_lookup[(*(ptr++) ^ crc) & 0xFF];
-
I think is has to be...
You're right; I only noticed after doing some basic verification checks. My table was also off; the one in your LUT post above is the correct one. I added the snippet needed to precalculate it. Using GCC-6.3.1 and -O2, it only adds about 62 bytes of ROM/Flash code.
-
Here's a C program that can compute any 32-bit CRC lookup table, given the 32-bit integer representing the polynomial. You can use any C11 or C++11 or later compiler to compile it. It takes one or more command-line arguments; run without any arguments to see the usage. The polynomial at hand in this particular thread is x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x1 + x0, which is represented by 0x04C11DB7 = 79764919.
#include <stdlib.h>
#include <inttypes.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
static uint32_t crc32_lookup[256];
void crc32_init(uint32_t poly)
{
uint32_t mask;
/* mask is poly with inverse bit order. */
mask = ((poly & 0x55555555) << 1) | ((poly >> 1) & 0x55555555);
mask = ((mask & 0x33333333) << 2) | ((mask >> 2) & 0x33333333);
mask = ((mask & 0x0F0F0F0F) << 4) | ((mask >> 4) & 0x0F0F0F0F);
mask = ((mask & 0x00FF00FF) << 8) | ((mask >> 8) & 0x00FF00FF);
mask = ((mask & 0x0000FFFF) << 16) | ((mask >> 16) & 0x0000FFFF);
for (size_t i = 0; i < 256; i++) {
uint_fast8_t byte = i;
uint_fast32_t crc = 0;
for (uint_fast8_t bit = 0; bit < 8; bit++) {
crc = (crc >> 1) ^ ((uint32_t)(-((byte ^ crc) & 1)) & mask);
byte >>= 1;
}
crc32_lookup[i] = crc;
}
}
uint32_t crc32(const void *src, const size_t len, uint32_t checksum)
{
const unsigned char *ptr = (const unsigned char *)src;
const unsigned char *const end = (const unsigned char *)src + len;
while (ptr < end)
checksum = (checksum >> 8) ^ crc32_lookup[(*(ptr++) ^ checksum) & 255];
return checksum;
}
static int parse_u32(const char *src, uint32_t *dst)
{
if (!src || !*src)
return -1;
const char *end = src;
unsigned long val;
errno = 0;
val = strtoul(src, (char **)(&end), 0);
if (errno)
return -1;
if (end == src)
return -1;
while (*end == '\t' || *end == '\n' || *end == '\v' ||
*end == '\f' || *end == '\r' || *end == ' ')
end++;
if (*end)
return -1;
if ((unsigned long)(uint32_t)(val) != val)
return -1;
if (dst)
*dst = val;
return 0;
}
int main(int argc, char *argv[])
{
uint32_t poly;
if (argc < 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
const char *arg0 = (argc > 0 && argv && argv[0] && argv[0][0]) ? argv[0] : "(this)";
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help]\n", arg0);
fprintf(stderr, " %s POLYNOM\n", arg0);
fprintf(stderr, " %s POLYNOM STRING [ STRING ... ]\n", arg0);
fprintf(stderr, "\n");
fprintf(stderr, "This program calculates the 32-bit CRC based on POLYNOM,\n");
fprintf(stderr, "a 32-bit unsigned integer representing the polynomial taps.\n");
fprintf(stderr, "When only POLYNOM is specified, the output is the C array\n");
fprintf(stderr, "of the 256 coefficients. When additional parameters are\n");
fprintf(stderr, "specified, the output is the checksum of those strings.\n");
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
if (parse_u32(argv[1], &poly) || !poly) {
fprintf(stderr, "%s: Not a nonzero 32-bit unsigned integer.\n", argv[1]);
return EXIT_FAILURE;
}
crc32_init(poly);
if (argc < 3) {
printf("/* CRC for ");
const char *sep = "";
for (int bit = 31; bit >= 0; bit--)
if (poly & (((uint32_t)1) << bit)) {
printf("%sx^%d", sep, bit);
sep = " + ";
}
printf(" */\n");
printf("const uint32_t crc32_lookup[256] = {\n");
for (int i = 0; i < 256; i++)
switch (i & 7) {
case 0: printf(" 0x%08" PRIx32 ", ", crc32_lookup[i]); break;
case 7: printf("0x%08" PRIx32 ",\n", crc32_lookup[i]); break;
default: printf("0x%08" PRIx32 ", ", crc32_lookup[i]); break;
}
printf("\n");
printf("uint32_t crc32(const void *src, const size_t len, uint32_t checksum)\n");
printf("{\n");
printf(" const unsigned char *ptr = (const unsigned char *)src;\n");
printf(" const unsigned char *const end = (const unsigned char *)src + len;\n");
printf(" while (ptr < end)\n");
printf(" checksum = (checksum >> 8) ^ crc32_lookup[(*(ptr++) ^ checksum) & 255];\n");
printf(" return checksum;\n");
printf("}\n");
printf("\n");
return EXIT_SUCCESS;
}
for (int arg = 2; arg < argc; arg++) {
const char *src = argv[arg];
const size_t len = (src) ? strlen(src) : 0;
const uint32_t crc = crc32(src, len, 0);
printf("0x%08" PRIx32 " \"%s\" (%zu chars)\n", crc, src, len);
}
return EXIT_SUCCESS;
}
When the polynomial key is the only command-line parameter, it will output the C implementation for crc32_lookup[256] table and a crc32(src, len, checksum) function returning the updated checksum.
-
Would you be kind enough and post the table for JAMCRC? I don't have a C compiler on my PC right now (yes I should have :) ). Thank you!
Even the code to generate it would be fine because in this application I have plenty of RAM. Presumably the code would be
uint32_t poly=0x04C11DB7;
uint32_t crc32_lookup[256];
void crc32_init(uint32_t poly)
{
uint32_t mask;
/* mask is poly with inverse bit order. */
mask = ((poly & 0x55555555) << 1) | ((poly >> 1) & 0x55555555);
mask = ((mask & 0x33333333) << 2) | ((mask >> 2) & 0x33333333);
mask = ((mask & 0x0F0F0F0F) << 4) | ((mask >> 4) & 0x0F0F0F0F);
mask = ((mask & 0x00FF00FF) << 8) | ((mask >> 8) & 0x00FF00FF);
mask = ((mask & 0x0000FFFF) << 16) | ((mask >> 16) & 0x0000FFFF);
for (size_t i = 0; i < 256; i++) {
uint_fast8_t byte = i;
uint_fast32_t crc = 0;
for (uint_fast8_t bit = 0; bit < 8; bit++) {
crc = (crc >> 1) ^ ((uint32_t)(-((byte ^ crc) & 1)) & mask);
byte >>= 1;
}
crc32_lookup[i] = crc;
}
}
The rolling CRC is just
uint32_t crc=0xffffffff; // initialisation
uint32_t data_value;
crc = (crc >> 8) ^ crc32_lookup[(data_value ^ crc) & 0xFF];
but I am not sure if data value is uint32_t or uint8_t. The statement
(data_value ^ crc) & 0xFF
is only going to work on the 8 least significant bits anyway.
-
Would you be kind enough and post the table for JAMCRC?
Everything I have posted in this thread is 0x04C11DB7 reflected.
The initial value and final XOR don't change the LUT. Only the poly and reflection do.
but I am not sure if data value is uint32_t or uint8_t. The statement
It is 8 bits because the LUT has 256 entries.
-
Yes that works great! Thank you.
The original func takes 2.3 secs to do 1MB and the table based one, 1 byte at a time, takes 0.4 sec.
The 4 byte one would probably be 0.1 sec :)
-
Would you be kind enough and post the table for JAMCRC?
Like oPossum wrote, that table is in their first LUT post, as well as in mine.
The poly 0x04C11DB7 (0b00000100110000010001110110110111) in inverse bit order yields mask 0xedb88320 (0b11101101101110001000001100100000).
Like the comment says, the five mask/poly lines just invert the bit order.
The function takes a pointer to the data, the number of bytes in the data to be checksummed, and the initial checksum; and returns the updated checksum.
The per-byte lookup is faster than the per-32bit, because the per-32bit iterates over bits, not bytes.
To get any faster, one would have to write an even tighter inner loop, which I'm not sure is possible (since GCC -O2 on Cortex-M4 gets it down to four instructions plus branch); or switch to a HMAC (hash-based message authentication code) using a hash function (https://en.wikipedia.org/wiki/List_of_hash_functions) that is faster than four iterations of CRC32.
In this case, I suspect that MurmurHash, specifically Murmur3_32 (https://en.wikipedia.org/wiki/MurmurHash#Algorithm), would perform well. This is because Cortex-M4 has a binary rotate right assembly instruction (rors) to implement ROL, and (hopefully!) a single cycle unsigned integer multiplication (that yields the low 32 bits of 32×32 multiplication).
-
Quite a few MCUs have a CRC32 peripheral, which will get you there faster. The OP might have considered it and couldn't figure out how to use it on their MCU, or something, IIRC? But really, they should have a look if they want any faster.
-
Right, SiliconWizard. STM32F417 programming manual (https://www.st.com/resource/en/reference_manual/rm0090-stm32f405415-stm32f407417-stm32f427437-and-stm32f429439-advanced-armbased-32bit-mcus-stmicroelectronics.pdf) does describe one. It essentially has two 32-bit registers: CRC_CR and CRC_DR. Writing 0x00000001 to CRC_CR resets the checksum to 0xFFFFFFFF, and writing 32-bit data values to CRC_DR updates the checksum. Reading CRC_DR yields the current checksum. So,
uint32_t stm32f417_crc32(const uint32_t *src, const uint32_t *end)
{
CRC_CR = 0x00000001; /* Resets CRC_DR to 0xFFFFFFFF */
while (src < end)
CRC_DR = *(src++);
return CRC_DR;
}
although you might need to enable the CRC peripheral clock (RCC_AHB1ENR.CRCEN = 1; or RCC_AHB1ENR |= 1<<12;) to enable the CRC unit first. It takes four AHB clock cycles (HCLK) per 32-bit word, and uses the same polynomial (0x04C11DB7) we've discussed here. The checksum is initialized to 0xFFFFFFFF. Therefore, the above should yield the same results as
uint32_t stm32f417_crc32(uint32_t *src, const uint32_t *end)
{
return crc32(src, (size_t)(end - src) * 4, 0xFFFFFFFF);
}
using the crc32() in e.g. my previous posts, assuming the memory region to verify is 32-bit aligned.
-
Best to use DMA to feed the peripheral.
If you end up just waiting for the DMA&peripheral in a busy loop, you could do part of the calculation in parallel with CPU, finding the optimal slice width, for example like processing 20% of the data in software while waiting for the DMA to do the trick for 80%, for most complexity for relatively little gain :).
I'm thinking, can you configure two DMA channels to trigger on the same DMA request signal, so that one of them moves the data to the CRC peripheral and another in memory? Probably not as reading out the SPI DR is the SPI FIFO pop operation.
-
ST offer a library (probably 1000 lines :) ) for using the 32F4 on-chip CRC generator but I decided to not pursue that because in my application I need to CRC check a data block produced externally, so needed a software version anyway.
In the context of what I am doing, running a CRC on even the entire CPU FLASH (1MB) in 400ms is good enough.
And in another part of the code I have to CRC up to 1MB coming off a 21mbps SPI FLASH chip, which is about 2MB/sec, so of the same order as doing it in software. One could fully hide the CRC calculation in the SPI transfer time in that case, if I wanted to unravel the SPI code which currently is a blocking function.
-
Best to use DMA to feed the peripheral.
Yes. Obviously the most efficient way is to use the CRC peripheral along with DMA. Then, it'll be significantly faster, and most of all, you can do something else while it's computing the CRC on the whole block.