Hi,
Do you guys have any experience with on the fly data compression? I am not talking about complex huffman encoding, something simpler but still gives me 20-30% savings and it can operate on fixed boundaries? (i.e. 256 bytes in x bytes out, but always 256 in)
FYI. My data is fairly linear, so the 256 byte will have a nice uniform pattern, should be pretty easy to compress with the right algo.
Thx,
sgpee
You may be able to encode the change in data rather than the absolute value.
I did this with a data logger that read a 12 bit ADC and stored the samples as 8 bit deltas (-127 to +127). Each block of 1024 bytes had the absolute value of the first 12 bit sample, 999 8 bit deltas, and a timestamp.
delta = adc - prev_adc;
if(delta < -127) {
delta = -127;
} else if(delta > 127) {
delta = 127;
}
prev_adc += delta;
*buffer_ptr++ = (uint8_t) delta;
complete source code
Have you thought about using SPI to write to an SD card? Then you can have virtually unlimited storage.
You rock.. this is a brilliant idea.. This works because my data is linear, no extreme changes.. I can actually even reduce this to 6 or 4 bits.. Thanks man, this is really really cool..
You may be able to encode the change in data rather than the absolute value.
I did this with a data logger that read a 12 bit ADC and stored the samples as 8 bit deltas (-127 to +127). Each block of 1024 bytes had the absolute value of the first 12 bit sample, 999 8 bit deltas, and a timestamp.
delta = adc - prev_adc;
if(delta < -127) {
delta = -127;
} else if(delta > 127) {
delta = 127;
}
prev_adc += delta;
*buffer_ptr++ = (uint8_t) delta;
complete source code
I have.. The size is an issue for me.. I am trying to squeeze everything i got to a 20x40 board.. Even uSD will screw the size..
Have you thought about using SPI to write to an SD card? Then you can have virtually unlimited storage.
You may be able to encode the change in data rather than the absolute value.
I did this with a data logger that read a 12 bit ADC and stored the samples as 8 bit deltas (-127 to +127). Each block of 1024 bytes had the absolute value of the first 12 bit sample, 999 8 bit deltas, and a timestamp.
delta = adc - prev_adc;
if(delta < -127) {
delta = -127;
} else if(delta > 127) {
delta = 127;
}
prev_adc += delta;
*buffer_ptr++ = (uint8_t) delta;
any adc value spikes will ruin the rest of data. nice algol though with compromise, can be even further compressed to 7-4bit per adc. carefull field study is needed for such a simple algol.
For data which is amenable to difference coding but may contain occasional spikes, variable-length coding can work well, e.g. reserve some code values or bits to indicate special-cases of absolute values or higher deltas.
If using an ARM, the barrel-shifter can make variable-length coding very efficient.
Storing log(delta) can also provide greater dynamic range if necessary. Use a LUT for log/antilog. The curve doesn't have to be perfectly log, it could be linear for small to medium values and log for larger.
any adc value spikes will ruin the rest of data.
No! There will be one or more wrong values, but the error is held over so it will correct when the data settles. It is essentially a low pass filter.
Note this line in the code:
prev_adc += delta;
That is very important.
Doing it this way...
prev_adc = adc;
would cause all readings following an overflow to be incorrect.
No! There will be one or more wrong values, but the error is held over so it will correct when the data settles. It is essentially a low pass filter.
prev_adc += delta;
correct! i just re-created it, a self-correcting algol, compromise only in the vicinity of spikes. not a 100% data match though, but brilliant!
ps: i abandoned it long time ago due to its lossy nature
It would not be hard to make it lossless with an RLE type scheme, if you really needed it.
packet length | start value | delta | delta | delta...
yes but more codes is needed to do the decision. if there are still more room for the mcu, it can as well do variable length coding if more compression is needed, where it try to do the smallest bit length first, and increase the bitlength as necessary based on behaviour of data coming. zad's link will be a good read.
It would not be hard to make it lossless with an RLE type scheme, if you really needed it.
packet length | start value | delta | delta | delta...
Hmm. Inspiration for this concept...
marker | absolute | delta | delta | delta...
The marker would be a special value outside the delta range. For example a signed 8 bit value can be -128 to 127. So use -127 to 127 for delta, and -128 for marker.
No limit on run length with this method. And no streaming problems (no need to know packet length at head of data block).
It is easy to make this algo quasi lossless. (one value will be gone)
My data is 16 bit and if I am willing to let go of one value say 0xff than that would act as a marker. In this case, if you are outside the boundary you insert marker and reset the start value to the extreme and go from there. I need to check the values to understand how well this would work but my hunch tells me, this would easily save 20-30%.
sgpee
It would not be hard to make it lossless with an RLE type scheme, if you really needed it.
packet length | start value | delta | delta | delta...
Hmm. Inspiration for this concept...
marker | absolute | delta | delta | delta...
The marker would be a special value outside the delta range. For example a signed 8 bit value can be -128 to 127. So use -127 to 127 for delta, and -128 for marker.
No limit on run length with this method. And no streaming problems (no need to know packet length at head of data block).
Mike,
I am not familiar with barrel-shifter, would you care to elaborate?
Thx,
sgpee
For data which is amenable to difference coding but may contain occasional spikes, variable-length coding can work well, e.g. reserve some code values or bits to indicate special-cases of absolute values or higher deltas.
If using an ARM, the barrel-shifter can make variable-length coding very efficient.
It is easy to make this algo quasi lossless. (one value will be gone)
it is easy as well to do it purely lossless as previously discussed. just simple test of if greater than 127 or less than -127 will throw out -128 (marker) and absolute adc value. the problem will be just it will expanded instead of compressed for highly oscillating adc.
another way is using table based compression. i have here a fin table prediction compression method for pc file, not mcu unfortunately (not sure where i got the "fin" name, its been years ago). it is alot simpler compared to lzw. but it will more codes to do the setup and housekeeping and only suitable for highly stable and repeatative data. it is possible to compress up to 1bit per data for a highly predictable data.
Mike,
I am not familiar with barrel-shifter, would you care to elaborate?
Thx,
sgpee
For data which is amenable to difference coding but may contain occasional spikes, variable-length coding can work well, e.g. reserve some code values or bits to indicate special-cases of absolute values or higher deltas.
If using an ARM, the barrel-shifter can make variable-length coding very efficient.
Most ARM instructions can shift or rotate one of the operands by any number of bits with no extra execution time, so doing things like packing variable-length bitfields into 32-bit words can be done very effciently. It can also shift data by a number of bits specified in a register.
e.g. in C, the following lines can each be done as a single ARM instruction:
a=b<<5|c;
b<<=c;
Assuming of course the compiler is suitably aware of the archtecture.
Most ARM instructions can shift or rotate one of the operands by any number of bits with no extra execution time, so doing things like packing variable-length bitfields into 32-bit words can be
pic and avr also have this bit shift feature iirc, only a bit at a time/op... asm'wise.