Author Topic: Compression by Encoding of Band and Dynamic limited Signals (Read 1452 times)

notadave · « **on:** October 04, 2019, 11:49:40 am »

Counting in the dumbest possible way might actually be a smart way in few cases.
To count like this:
0. 0 (zero)
1. 01
2. 011
3. 0111
4. 01111
...
length=n+1 might seem horribly inefficient, but if the probability distribution has the shape f(n) = 1/(n+1)² and you would otherwise encode with fixed 8 bit, then it is more efficient by more than 1 bit per value. The 255 would only occur 1/256², but be 256 bit long. The zero on the other hand would make up 1/1.641 of all stored numbers.
Now if this seems like an extreme edge case to you then take it as an introductory example and consider counting more like usual:
0.0 (zero)
0.1
10.0
10.1
110.0
110.1
1110.0
1110.1
11110.0
11110.1
...
unlimited dynamic range in a limited amount of data given the case big changes almost never happen.
Or if you do not wish to reserve the possibility of counting to infinity:
00.0 (zero)
00.1
01.00
01.01
01.10
01.11
10.000
10.001
10.010
10.011
10.100
10.101
...
11.0000
11.0001
11.0010
11.0011
11.0100
...
11.1111 (30th and last combination, only an example)
In the last case the maximum length is limited to m+2^m-1+k. In this case I chose m=2 and k=1 making the zero m+k=3 bit long. To replace a 8 bit encoding you might chose m=3 k=0 counting from 000. to 111.1111111 yielding 1+2+4+8+16+32+64+128 combinations, just one short of 256.
You see there are many ways to count that you can fit to your data.
For what signals is this a good and simple way? For low noise, little high frequency content signals such as loss less audio and high resolution data logging.
For values that are symmetrically distributed around zero you would only store a unsigned differential, the sign (+/-) bit and probably chose an encoding that does not favor small numbers as heavily as the given examples.

notadave · « **Reply #1 on:** October 04, 2019, 05:42:48 pm »

In my previous post I suggested storing only the difference to the last value, but you can do better. Storing the difference is the simplest case of using a predictor. Looking back one value and assuming the numbers are not random and not equally distributed the best prediction would be to assume the number will be the same. Looking back two values you could assume a linear trend, looking back further the prediction will get better. Therefore to make the data fit the variable length encoding you can instead of storing the difference to the last value store the difference to the prediction. If the prediction is good this will shift/skew the distribution towards small numbers which are then encoded as short sequences.

Now you might think that such a simple method could not do much, but linear predictive coding (LPC) followed by Rice code has been used a lot.
What I described is the basis of the audio codec FLAC.

The application that I have in mind is data logging, getting the longest log into the few bytes of NVRAM / EEPROM of a MCU.

Further reading:
https://en.wikipedia.org/wiki/Linear_prediction
https://en.wikipedia.org/wiki/Golomb_coding
https://en.wikipedia.org/wiki/Prefix_code

Has anyone implemented something like this or found a free and light weight 'C' implementation? Please post a link and comment.

notadave · « **Reply #2 on:** March 21, 2021, 01:35:33 pm »

This topic relates to: VLQ-Coding which is used in MIDI
but which is a Universal code which is also a self-delimiting/prefix binary code. They are called universal because they will work with any distribution.

JohnnyMalaria · « **Reply #3 on:** March 21, 2021, 03:26:03 pm »

This reminds me of run length-based Huffman encoding used in image compression (e.g, JPEG and DV video).

I use it for compressing ADC data by exploiting the fact that my signal changes sufficiently slowly that the magnitude of the differences between adjacent data points requires less bits than the value of the signal itself. Hence, a 16-bit signal can be represented by a series of variable bit length difference signals. Because I know the general properties of my signal, I can optimize the Huffman table to suit it.

T3sl4co1l · « **Reply #4 on:** March 21, 2021, 04:38:06 pm »

For a visual explanation, see: https://youtu.be/aF1Yw_wu2cM
specifically around 11:40. It appears they used a variety of gamma encoding.

Quote from: JohnnyMalaria on March 21, 2021, 03:26:03 pm

This reminds me of run length-based Huffman encoding used in image compression (e.g, JPEG and DV video).

I use it for compressing ADC data by exploiting the fact that my signal changes sufficiently slowly that the magnitude of the differences between adjacent data points requires less bits than the value of the signal itself. Hence, a 16-bit signal can be represented by a series of variable bit length difference signals. Because I know the general properties of my signal, I can optimize the Huffman table to suit it.

Yes, you can write a Huffman code with similar statistics, though it has finite range because the table must be known. Huffman can do better when the distribution is known, and Arithmetic can do even better (in practice, only modestly so) as it's not limited to whole bit size codewords.

Tim

notadave · « **Reply #5 on:** April 03, 2021, 02:51:59 pm »

Quote from: JohnnyMalaria on March 21, 2021, 03:26:03 pm

I use it for compressing ADC data by exploiting the fact that my signal changes sufficiently slowly that the magnitude of the differences between adjacent data points requires less bits than the value of the signal itself. Hence, a 16-bit signal can be represented by a series of variable bit length difference signals. Because I know the general properties of my signal, I can optimize the Huffman table to suit it.

I do not understand what this has to do with tables and Huffman?
I was writing about exactly what you do, encoding of highly over-sampled time-series data such as an oscilloscope or data logger produces.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Compression by Encoding of Band and Dynamic limited Signals (Read 1452 times)

notadave

Compression by Encoding of Band and Dynamic limited Signals

notadave

Re: Compression by Encoding of Band and Dynamic limited Signals

notadave

Re: Compression by Encoding of Band and Dynamic limited Signals

JohnnyMalaria

Re: Compression by Encoding of Band and Dynamic limited Signals

T3sl4co1l

Re: Compression by Encoding of Band and Dynamic limited Signals

notadave

Re: Compression by Encoding of Band and Dynamic limited Signals

Share me