Author Topic: Compression library for Grid data logger  (Read 1636 times)

0 Members and 1 Guest are viewing this topic.

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1906
  • Country: ca
Compression library for Grid data logger
« on: January 27, 2022, 08:22:14 am »
Hi,
The data logger would sample data from The Grid at 16Ksps, it would have 8 channels of 24bit ADC data,8 channels of 8bit ADC Data, and 32bit Digital Input and outputs too, I want to use a Cortex M7 processor for the Job.
The ARM would handle lots of things including a GUI  @ 720px720p and Ethernet connection too, so I need a lossless compression library which would preferably use low CPU usage, the Ram and Flash requirements are not that hard, because I would use external Flash and SDRAM. I prefer to reach a compression ratio better than 50% .


A single channel ADC data is attached for your reference, (I have converted the data to ASCII) But I would use it on Binary data in the final Design.

So which compression library would you recommend?

« Last Edit: January 27, 2022, 08:24:09 am by ali_asadzadeh »
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1906
  • Country: ca
Re: Compression library for Grid data logger
« Reply #1 on: January 29, 2022, 09:27:23 am »
Come on Guys, do you recommend any open source library?
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline Marco

  • Super Contributor
  • ***
  • Posts: 6723
  • Country: nl
Re: Compression library for Grid data logger
« Reply #2 on: January 29, 2022, 12:18:04 pm »
If memory isn't a problem then for 8 bit I'd try simply Huffman encoding the difference between the sample from 1/50th (or 1/60th) seconds before. For 24 bits, the same except only compress for a range of 2^x around the previous value, if the difference is more have a bit to bypass compression.

Can use a more complex predictor if you want to take into account cycle time variance. In which case you might use a predictor based only on a couple of previous samples, so you don't need as much memory.
« Last Edit: January 29, 2022, 12:21:09 pm by Marco »
 

Online mariush

  • Super Contributor
  • ***
  • Posts: 5029
  • Country: ro
  • .
Re: Compression library for Grid data logger
« Reply #3 on: January 29, 2022, 01:46:01 pm »
Use something standard and well known ... deflate, lzo, lz4, whatever

The compression is less important, it's how you arrange the data that's important. 
Just grouping the bytes would help the compressor significantly.  (ex take 16K readings and do 16 k bytes of first byte from every reading, then 16k of second byte from the next reading and so on)

Just did an experiment with your data, and with first 16K records, the original 49152 byte file compressed using 7z lzma to 44266 bytes ,  and the one rearranged was reduced to 32260 bytes (67% of original size). Deflate does around 33500 bytes.

You can see for yourself with the attached files that contain your data, and the script I've used to create them (in php)

You can go further... if you know you have long runs of negative numbers, and then long runs of positive numbers  ... you could just have a header to every chunk of records where you say N total samples, 100 negative, 500 positive, 1 negative, 1000 positive ... you may use  100 bytes or as many as needed, but you save 16k records x 1 sign bit  or 2KB.

In the attached example, there's 128 changes of sign in the first 16k samples if my code is correct, and most runs are up to 129 samples of same sign, so if you use 2  bytes for each group then that's 256 bytes header but you're saving 16 kbps or 2048 bytes... so overall 1792 bytes reduction.

You could have other tricks like storing only difference between samples ... have a "keyframe sample" which is stored in 3 bytes, then have 16 or as many as you want that store the difference between samples and use 2 or 3 bytes for the next values ( first bit 0 = 2 bytes, first bit 1 = 3 bytes)    dynamic size for each value  (for example your number could be multiples of 4 bits (minimum 12 bits, maximum 28 bits) ...  use 2 bits to signify length, one bit for sign, rest of the bits for difference





 
The following users thanked this post: ali_asadzadeh

Offline fcb

  • Super Contributor
  • ***
  • Posts: 2117
  • Country: gb
  • Test instrument designer/G1YWC
    • Electron Plus
Re: Compression library for Grid data logger
« Reply #4 on: January 29, 2022, 02:09:31 pm »
Why bother for just 50% compression. Just double the size of the memory.

When we've stored mains waveforms, we've either stored them unmolested or we've only stored the cycles that exceed a specified window (and during "normal" data just stored V/I/PF/etc.. per cycle).
https://electron.plus Power Analysers, VI Signature Testers, Voltage References, Picoammeters, Curve Tracers.
 

Offline Jeroen3

  • Super Contributor
  • ***
  • Posts: 4078
  • Country: nl
  • Embedded Engineer
    • jeroen3.nl
Re: Compression library for Grid data logger
« Reply #5 on: January 29, 2022, 03:13:14 pm »
Can you use an audio compression library?
FLAC is free, takes about 30% of the size away.
 

Offline Harjit

  • Regular Contributor
  • *
  • Posts: 141
  • Country: us
Re: Compression library for Grid data logger
« Reply #6 on: January 29, 2022, 05:46:56 pm »
List I've saved off - in no particular order i.e. I think they will work for my embedded application but I haven't benchmarked them.

I had posted this list in another thread and the person went with heatshrink.

https://github.com/lz4/lz4
https://github.com/atomicobject/heatshrink
https://github.com/richgel999/miniz
https://www.segger.com/products/compression/emcompress/emcompress-togo/
https://github.com/dblalock/sprintz
http://www.oberhumer.com/opensource/lzo/
 
The following users thanked this post: ali_asadzadeh

Offline Marco

  • Super Contributor
  • ***
  • Posts: 6723
  • Country: nl
Re: Compression library for Grid data logger
« Reply #7 on: January 29, 2022, 06:00:54 pm »
https://github.com/dblalock/sprintz

Forgot about that one, apart from FLAC it's the only ready made predictive coder mentioned.
« Last Edit: January 29, 2022, 06:31:43 pm by Marco »
 

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1906
  • Country: ca
Re: Compression library for Grid data logger
« Reply #8 on: January 30, 2022, 09:13:30 am »
Thanks guys for the Feedbacks, I should try some of the libraries that you mention, I have take a look at heatshrink, and it seems it's very easy to use.
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Online mfro

  • Regular Contributor
  • *
  • Posts: 210
  • Country: de
Re: Compression library for Grid data logger
« Reply #9 on: January 30, 2022, 09:32:36 am »
Your data appears scrambled (a trailing 1 sometimes)?

I would fix that first before attempting to compress.
Beethoven wrote his first symphony in C.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Compression library for Grid data logger
« Reply #10 on: January 30, 2022, 11:30:25 am »
I had a go at your data file (after deleting any weird 2nd value on a line).

I tried various combinations of a pre-processing stage, then an off the shelf compressor.

Code: [Select]
text file
752792 raw
633331 lz4
446806 lz4 -9
342929 gzip
341785 gzip -9
127860 bzip2
 86559 zstd
 54196 xz

24 bit binary
307179 raw
307198 lz4
305236 lz4 -9
284483 gzip
284483 gzip -9
100742 bzip2
 71421 zstd
 53912 xz

LEB128
306495 raw
290823 gzip -9
100382 bzip2
 73228 zstd
 57348 xz

LEB128 1st diffs
257751 raw
 65511 lz4
 64077 lz4 -9
232925 gzip
232925 gzip -9
 76949 bzip2
 58545 zstd
 54508 xz

LEB128 2nd diffs
201742 raw
 51276 lz4
 51261 lz4 -9
191888 gzip
 64178 bzip2
 48041 zstd
 43144 xz

LEB128 3rd diffs
201426 raw
 51196 lz4
 51182 lz4 -9
 63699 bzip2
 47324 zstd
 42284 xz

The patterns:

- the heavyweight compressors zstd and especially xz  got pretty much the same results no matter what form the data started off in.

- gzip for some unknown reason *really* doesn't like LEB128 encoding.

- lz4 for some unknown reason likes LEB128 encoding a lot.

- 1st diffs gets 75%-80% reduction in size.

- 2nd diffs gets another 20% off.

- 3rd diffs probably not worth it. It also risks blowing up on data that is less smooth.


A combination of 2nd diffs and the very lightweight and simple lz4 compressor knocks 83% off the file size.

Source code:

Code: [Select]
#include <stdio.h>
#include <stdint.h>

void LEB128(int32_t n){
  int more = 1, isNeg = n < 0;
  while (more) {
    int byte = n & 127;
    n >>= 7;
    if (isNeg) n |= (~0U << (32-7)); // in case was not arithetic shift                                       

    if (n == 0 && !(byte & 0x40) || n == -1 && (byte & 0x40)) {
      more = 0;
    } else {
      byte |= 0x80;
    }

    putchar(byte);
  }
}

int main(){
  int32_t d0 = 0, d1 = 0;
  int32_t new_d0, new_d1, new_d2;
  while (scanf("%d", &new_d0) != EOF){
    new_d1 = new_d0 - d0; d0 = new_d0;
    new_d2 = new_d1 - d1; d1 = new_d1;
    LEB128(new_d2);
  }
  return 0;
}
 
The following users thanked this post: ali_asadzadeh

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1906
  • Country: ca
Re: Compression library for Grid data logger
« Reply #11 on: February 01, 2022, 08:00:50 am »
Bruce thanks for the tips, I never heard of xz, and it seems very impressive,
where is the Github page for that? is it CPU intensive?
Also Since you have a lot of experience in FPGA, I have about 5K free Lut's space in my Gowin FPGA, before sending the uncompressed data to the MCU, can we do something in the FPGA too?
« Last Edit: February 01, 2022, 08:05:42 am by ali_asadzadeh »
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Compression library for Grid data logger
« Reply #12 on: February 01, 2022, 08:23:13 am »
Bruce thanks for the tips, I never heard of xz, and it seems very impressive,
where is the Github page for that? is it CPU intensive?

xz is the current benchmark compressor but it is *extremely* resource-intensive for compression. It can be ok to decompress it in embedded, but I don't think you could usefully do compression.

xz is usually used where the same compressed file will be downloaded by millions of people and it's worth spending a lot of CPU time and RAM on compressing it to the maximum.

Several Linux distros have switched to using zstd because it compresses much faster than xz (and also compresses much faster), but is only a little worse compression. But it's still very resource-intensive by embedded standards.

THAT'S WHY my recommendation is to use the embedded-friendly lz4 compressor, with the custom preprocessing I showed.

Quote
Also Since you have a lot of experience in FPGA, I have about 5K free Lut's space in my Gowin FPGA, before sending the uncompressed data to the MCU, can we do something in the FPGA too?

I'm not experienced in FPGAs.
 
The following users thanked this post: ali_asadzadeh

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1906
  • Country: ca
Re: Compression library for Grid data logger
« Reply #13 on: February 01, 2022, 09:51:27 am »
Thanks, Where do you check the current benchmark compressor list?
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 4039
  • Country: nz
Re: Compression library for Grid data logger
« Reply #14 on: February 01, 2022, 11:20:31 am »
Thanks, Where do you check the current benchmark compressor list?

You run different compressors on your own data.

I suppose someone does this on some standard corpus from time to time. I don't know anything about that. Using your own data is better.

Once upon a time I guess comp.compression was the place. I haven't looked there for a couple of decades.
 
The following users thanked this post: ali_asadzadeh


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf