Author Topic: Arithmetic Decoding for Embedded  (Read 7182 times)

0 Members and 1 Guest are viewing this topic.

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Arithmetic Decoding for Embedded
« on: October 28, 2020, 05:36:42 am »
Thought I'd start a thread on this, as it seems:
1. There isn't all that much hands-on info out there (e.g. just a few threads on StackOverflow).
2. Have seen one example, which... I didn't read in detail, and it looks overly verbose, concentrating more on the abstraction than the mechanism?  Will read in detail yet.
Also relevant, the Langdon paper: https://www.cs.cmu.edu/~aarti/Class/10704/Intro_Arith_coding.pdf
3. The abstract concept is slick, and reasonably easy to understand. But I'm still absorbing it and thinking through how it works mechanically, in particular when it comes to multiples that aren't powers of 2 (i.e. when the rescaling (multiplication) step isn't a trivial bit shift).
4. I've written a Huffman-like encoder before.  Which, hmm, I don't have the code released anywhere, but I used it to store the small and bold fonts in this screenshot, which netted a modest 59% compression ratio.  Although it's rather less when the decoder is included, which is rather complex, and also probably in need of optimization.  So, I understand Huffman coding, and I also understand how can be seen as (is equivalent to) a restricted case of A.C.  Which almost implies that it should just drop right in...

I'm not sure yet how a base-2^N encoding is dramatically different from a Huffman encoding; I can see that, for very extreme probabilities, a given state can be persistent between bits, i.e., a rate of <1 bit/symbol is possible, or equivalently, a bit can encode repeated symbols.  I'm not so sure how to en/decoded that...  Probably this becomes obvious when limited precision is solved.

FWIW, these are the statistics for Lucidia 8pt, code points 32-127, the graphics data I'm working with right now.  Antialiased, 16 color palette, 0 = black, 15 = white.  Understandably, white is massively more common than the rest, black has a modest showing, and the rest are pretty evenly distributed.  I've added the cumulative probability, because... that has something to do with arithmetic encoding but I haven't solved that yet either.

Code: [Select]
Index Value Count Probability (%) Cum. Prob.
0 0 824 13.35 0
1 23 92 1.49 13.35
2 36 116 1.88 14.84
3 54 110 1.78 16.72
4 68 88 1.43 18.50
5 85 73 1.18 19.93
6 100 57 0.92 21.11
7 119 53 0.86 22.03
8 134 49 0.79 22.89
9 151 76 1.23 23.68
10 165 96 1.56 24.91
11 183 73 1.18 26.47
12 197 113 1.83 27.65
13 213 93 1.51 29.48
14 229 111 1.80 30.99
15 255 4147 67.20 32.79
99.99 (rounding +/-0.01)
1-14 1200 19.45 (avg. 1.39%/ea.)
Total 6171 100

https://www.seventransistorlabs.com/Images/LucidiaGuides.png
Here's some sample graphics, including red dots marking the start of each character box (it's a variable width font after all).  And the 90 degree rotation as mentioned, and also it's mirrored because... I never flipped that bit on the ST7735 so just encoded the font flipped instead, whatever. :)

Part of my motivation behind this is, the original (Huffman) encoder I wrote, it's column oriented.  Which works fine on the ST7735 display I was playing with at the time, which has write modes in all cardinal directions.  But now I'm playing with something that only has row-major writes, so I can't really do that.  So, the half-assedsimplest porting over of the Huffman source, draws the text at a 90 degree angle. :P

I suspect that arithmetic encoding can not only provide better compression, but better compression even for very short sequences (like single rows of a character).  I think what I want is to encode the characters, one row at a time, so that the string can be drawn one row at a time, selecting each segment of that row from the respective characters, before moving to the next row and translating/decoding, etc.  (This saves the most display I/O -- setting the draw window is relatively expensive, and it can only be set in horizontal multiples of 8 pixels.  So, I could also draw whole block characters at a time, but it's slow.)  I can also add a higher level compression, like eliminating redundant (identical) rows of characters; which should help modestly for characters with lots of verticals (1, l, I, etc.), but I'll have to run some tests to see how much it does overall; it might not actually be worthwhile.


- Why not just use X?

I'll get to zlib and friends, sooner or later.  Compression rate is probably much better, too.  And the method is pretty much what I just hinted at, i.e. referencing a dictionary of various sized blocks.  But as long as I don't know arithmetic encoding, I would like to address that, and having a motivating application is best. ;D

- Why not buffer the graphics before writing?

AVR platform, teeny RAM.  Graphics are 24 bit.  I can't do worse than packed 4-bit bitmaps.  Which, is what I'd be doing... if I didn't already write a compression routine.  But I did, so obviously I want even more!

An aside, the memory allocation on this project is ludicrous... MCU has 16k SRAM and 256k Flash, video controller has dedicated (note, no bus read access) 16MB SDRAM, and a 4GB SD card (or more).  Just personal yak shaving, no need for sensible design. :)

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 12012
  • Country: us
    • Personal site
Re: Arithmetic Decoding for Embedded
« Reply #1 on: October 28, 2020, 05:57:06 am »
Your post reminded me to post my implementation from a couple years ago. It is here https://github.com/ataradov/random [my first repo with a politically correct main branch name]

It is written to be used on embedded devices. I don't know about AVR, since I was mostly targeting ARM and other 32-bit platforms. There is probably some room for improvement for 8-bit devices.

The whole coder and decoder are just ~200 lines of code. But that code would require 32 bit multiplications and divisions. It uses a callback function when it has to read/write the data, so it does not need to see the whole buffer at the same time. This is useful for reading large images from external memory devices.

I don't remember a lot of details of how it works exactly, since it is one of those things where I can keep it in my head just enough to write the code. But I can probably remember if I look at the code long enough.

This was originally created to store FPGA bit files in the MCU flash memory, and it was more or less tested to work.
« Last Edit: October 28, 2020, 06:05:43 am by ataradov »
Alex
 
The following users thanked this post: mikerj, 2N3055

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #2 on: October 28, 2020, 05:20:32 pm »
Ohhhh, so the bounds are divided, instead of the bitstream multiplied.  Of course.

Division, ack, I'll spend fewer cycles bit-banging the damn display controller. :-DD

Hmm.  For very short blocks, it might well be effective to do multiplication on the whole thing; with 8x8 hardware multiply and some number of bytes in the accumulator, it's not too terrible, still faster than division.  And blocks can always repeat, if more are needed to complete the required data.  A block could be 1-4 bytes, left adjusted (i.e. towards the decimal, the rest implicitly padded with zeroes).  I suppose I should see if 2 bytes has reasonable efficiency too.  And that would be with a predefined CDF, which is fine.

Your example uses an adaptive CDF, which I take it, isn't efficient at first, but "quickly" approaches the actual CDF, given some definition of "quick"?  I suppose it's a hyperbolic approach (or trending towards it, if the data are "lumpy")?  Or hmm, maybe it's actually exponential, since the model_update is done incrementally then halving the whole thing, in which case the time constant is set by MAX_SCALE?  Which could then be set lower for a quicker approach but poorer best-case compression (in the event that very low-entropy data are given).

Do you remember what compression ratio you got for various sizes and types of data?  With those stonking great 8MB buffers you could easily demonstrate some pretty good sizes...

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 12012
  • Country: us
    • Personal site
Re: Arithmetic Decoding for Embedded
« Reply #3 on: October 28, 2020, 05:48:34 pm »
I used adaptive CDF because I did not want to invent a file format that would store precomputed CDF along with the stream.  Right now it is just a plain bit-stream going in, bit-stream going out.

Also, for smaller images, storing pre-computed CDF takes up significant amount of space compared to the data.

I have not experimented with a pre-computed CDF, but I figure the adaptive one it is good enough.

The code here is complete, all you need to build it is run "gcc arithmetic_coder.c arithmetic_coder_demo.c" if you have gcc installed.

Here are some numbers from random files (the ratio is percentage of the size reduction):
1. FPGA bit-stream: Original size: 577800 / Encoded size: 107206 (ratio = 81.446 %)
2. JPEG picture:  Original size: 1359562 / Encoded size: 1348164 (ratio = 0.838 %)
3. C code: Original size: 10537 Encoded size: 6947 (ratio = 34.070 %)
4. Its own executable: Original size: 160222 / Encoded size: 113156 (ratio = 29.375 %)
5. LucidiaGuides.png from the first post: Original size: 2416 / Encoded size: 1965 (ratio = 18.667 %)
6. LucidiaGuides.bmp (converted from the PNG using PaintXP): Original size: 4598 / Encoded size: 1756 (ratio = 61.809 %)

I can also experiment with your images if you have some specific examples you want me to test.
« Last Edit: October 28, 2020, 06:03:51 pm by ataradov »
Alex
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #4 on: October 28, 2020, 09:54:25 pm »
Huh, surprised the C code didn't compress better.  Wouldn't have thought the per-character entropy was that high?

A bit surprised PNG has that much to give up, though that's probably all palette, huh?

For comparison, my Huffman code did 1625 bytes on the Lucidia font (sans guide marks, 16 color palette), 1811 bytes including metadata (storing lengths/offsets, some bits wasted on per-character encoding, etc.).

Tim
« Last Edit: October 28, 2020, 09:58:11 pm by T3sl4co1l »
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 12012
  • Country: us
    • Personal site
Re: Arithmetic Decoding for Embedded
« Reply #5 on: October 28, 2020, 10:08:37 pm »
That looks about the compression you get for the text. Here is a bigger example (combined a bunch of embedded C code + headers).

Original size: 418807 / Encoded size: 283880 (ratio = 32.217 %)

Trying to compress the compressed result: Original size: 283880 / Encoded size: 284239 (ratio = -0.126 %), so the entropy was extracted.

The same file compressed with ZIP is ~72 Kbyte, but ZIP uses shared 32 Kbyte dictionary. 7Z with 256 Mbyte dictionary is ~57 Kbyte.

You can definitely get better compression with multi-pass algorithms.

Another example.

A book in English (ASCII): Original size: 837253 / Encoded size: 459893 (ratio = 45.071 %)
A book in Russian (UTF-8, most characters are 2 bytes, with one byte fixed, so repetitive pattern all over the text): Original size: 1385611 / Encoded size: 713256 (ratio = 48.524 %)

So code compresses worse than general text.
Alex
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 12012
  • Country: us
    • Personal site
Re: Arithmetic Decoding for Embedded
« Reply #6 on: October 28, 2020, 10:11:21 pm »
Your PNG is just not optimized for size. There is a huge chunk of 00 in the beginning plus a text about the creation time.

The same file after compressing it with some random online tool is 1,471 bytes. And if you try to compress that, you get nothing: Original size: 1471 / Encoded size: 1492 (ratio = -1.428 %)

And ZIP expands it to 1611 bytes.
« Last Edit: October 28, 2020, 10:18:02 pm by ataradov »
Alex
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8627
Re: Arithmetic Decoding for Embedded
« Reply #7 on: October 29, 2020, 12:36:55 am »
If you want to compress graphics effectively, look at JBIG2's MQ-coder. No multiplications or divisions required, and memory usage (and compression) depends on the bit-length of the context; if you can use most of the RAM, you can probably use a 12 or 13 bit context.
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 10289
  • Country: gb
Re: Arithmetic Decoding for Embedded
« Reply #8 on: October 29, 2020, 12:56:18 am »
If you want to compress graphics effectively, look at JBIG2's MQ-coder. No multiplications or divisions required, and memory usage (and compression) depends on the bit-length of the context; if you can use most of the RAM, you can probably use a 12 or 13 bit context.
Is MQ still patent encumbered, or have they expired now?
 

Offline ledtester

  • Super Contributor
  • ***
  • Posts: 3467
  • Country: us
Re: Arithmetic Decoding for Embedded
« Reply #9 on: October 29, 2020, 02:09:36 am »
Huh, surprised the C code didn't compress better.  Wouldn't have thought the per-character entropy was that high?

From what I can tell arithmetic encoding is not context sensitive. For instance, if you know the preceding three characters are"c', 'h' and 'a', it is extremely likely that the next character is 'r'. Compression schemes such as LZW take advantage of this kind repetition.
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #10 on: October 29, 2020, 04:05:29 am »
Huh, surprised the C code didn't compress better.  Wouldn't have thought the per-character entropy was that high?

From what I can tell arithmetic encoding is not context sensitive. For instance, if you know the preceding three characters are"c', 'h' and 'a', it is extremely likely that the next character is 'r'. Compression schemes such as LZW take advantage of this kind repetition.

Yeah, it doesn't match patterns (the closest you can get is an extremely common symbol encoded at < 1 bit/symbol, which almost looks like neighboring correlation; but, well, if a symbol is very probable, runs will happen to be probable as well).  To be more specific, I'm surprised that the character set is utilized so broadly, and evenly -- mashing all printable key combinations (US ASCII) gives you only 95 characters to work with.  The compression rate implies the use of around 87 of them, at equal probability.  Most variables being mostly lowercase, and much longer than operators (punctuation), should seem to give much less entropy than that.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16309
  • Country: fr
Re: Arithmetic Decoding for Embedded
« Reply #11 on: October 29, 2020, 02:44:15 pm »
Well,  as you yourself did, I can of course question your choice of MCU to begin with. Going for something more adapted would make more sense IMHO, so I can't personally "validate" your approach, but that's just my opinion. Your call obviously.

That said, this can be a fun challenge in itself. I don't have much experience with "arithmetic coding", but I do with Huffman compression and with image compression in general.

I wouldn't go for a Huffman-based compression. IME, it doesn't perform all that well on typical images, and some simpler approaches can perform as well, or in some cases, even better. Again just talking from MHE, I have implemented a simple (and fast) real-time compression/decompression scheme for bitmaps. It was just some kind of 2D-RLE. On very "random" images with a large number of bpp, this doesn't work well, but on well "structured" images such as images containing text and typical "geometric" figures, with a modest bpp, it performed surprisingly well. I was encoding/decoding video streams at 800x600 @60 Hz, 12 bpp in real time, frame by frame, on a modest FPGA. Implementing the decoding in software was also fast and straightforward (I had for testing purposes.) IIRC, I could get a typical average of 30% of the original size (and much less for very simple or very "sparse" image.) It was fine for me, don't know if that would be enough compression for your needs.

So, I'd suggest considering 2D-RLE. Just my suggestion though.
« Last Edit: October 29, 2020, 02:47:05 pm by SiliconWizard »
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #12 on: October 29, 2020, 03:05:01 pm »
Already made one of those actually, sort of:
https://htmlpreview.github.io/?https://github.com/T3sl4co1l/st7735_gfx/blob/master/compr.html
ST7735 allows pixel addressed window writes, so a logical speedup is filling in large (2D) areas with solid colors.  It performs modestly well on line drawings.

That's actually what was used in that reference screen from before:
https://www.seventransistorlabs.com/Images/ActiveLoad_Status1.png
The background is written in this way, which gives mediocre results around the logo, but is effective elsewhere.  The button outlines are drawn this way too, which take up just a few commands in this image format.  The text is drawn using the Huffman decoder; the button text is actually a 3rd font, implementing just a few custom characters.

Obviously, that's impractical on this controller, which is limited to multiples of 8 pixels horizontally.  The above already does poorly on text, and character widths being comparable to that multiple, it's wholly impractical. 

Tim
« Last Edit: October 29, 2020, 03:11:01 pm by T3sl4co1l »
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Online SiliconWizard

  • Super Contributor
  • ***
  • Posts: 16309
  • Country: fr
Re: Arithmetic Decoding for Embedded
« Reply #13 on: October 29, 2020, 04:04:04 pm »
Already made one of those actually, sort of:
https://htmlpreview.github.io/?https://github.com/T3sl4co1l/st7735_gfx/blob/master/compr.html

I took a quick look, so I may be missing things. But as I see it, it's indeed some form of 2D-RLE, but relying on predefined shapes (like rectangles) of solid color. It covers part of what a generic 2D-RLE can do, but not everything.

In my implementation, basically: on a selected direction (horizontal or vertical), classic RLE. Say rows for instance. But, for the other direction (here columns), you can copy whole fractions of the previous columns, not necessarily made of a single color, as long as said fractions are identical. Then you mix both and select what yields the shortest encoding when in conflict. Such repeating patterns are not that common in real-life photos, but for synthetic images, they are very common. I think this overall yields better compression than using predefined shapes of solid colors.

You could post examples of bitmaps (in an uncompressed format please, such as PNG, otherwise that would ruin it) you'd like to compress, so I can try with my compressor and see what I get, out of curiosity.
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8627
Re: Arithmetic Decoding for Embedded
« Reply #14 on: October 30, 2020, 12:16:51 am »
If you want to compress graphics effectively, look at JBIG2's MQ-coder. No multiplications or divisions required, and memory usage (and compression) depends on the bit-length of the context; if you can use most of the RAM, you can probably use a 12 or 13 bit context.
Is MQ still patent encumbered, or have they expired now?
MQ patents have long expired (were published in late 80s/early 90s). JBIG2 patents also expired a few years ago too.
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #15 on: November 09, 2020, 02:01:46 am »
Your PNG is just not optimized for size. There is a huge chunk of 00 in the beginning plus a text about the creation time.

The same file after compressing it with some random online tool is 1,471 bytes. And if you try to compress that, you get nothing: Original size: 1471 / Encoded size: 1492 (ratio = -1.428 %)

And ZIP expands it to 1611 bytes.

So I think I got the adaptive stream encoder working:

Attach: 16-color 53x53 bitmap, encoded.  That's 2809 pixels, or 1405 bytes packed and otherwise unencoded; 554 bytes with my Huffman code; 483 bytes with your encoder.  Not bad.

It also doesn't do too bad on single characters (1770 bytes, vs. 1649 bytes Huffman), but does terrible on rows, as expected (3229 vs. 2003).

If for some perverse reason you wanted it ported to JS, it looks like so:
(state is an empty Object passed in, to which properties are added by the initializer.  It's initialized to a form equivalent to the original struct.  symbols is a list of symbols being encoded, and their probability; obviously this method doesn't use it other than to set length.  input is a byte to encode, and output is an array to push() bytes into.  The function grouping could use some work and the naming is verbose, but whatever.)

Code: [Select]
/** Arithmetic encoder, adaptive  */
/* See: https://github.com/ataradov/random  */
function encodeArithAdaptStart(state, symbols) {
state.TOP_VALUE = 0xffff;
state.MAX_SCALE = 0x3fff;
state.FIRST_QTR = (Math.floor(state.TOP_VALUE / 4) + 1);
state.HALF = (2 * state.FIRST_QTR);
state.THIRD_QTR = (3 * state.FIRST_QTR);
state.ACODER_N = symbols.length;
state.low = 0;
state.high = state.TOP_VALUE;
state.pending = 0;
state.byte = 0;
state.bit = 0;
state.cdf = [];
for (var i = 0; i < state.ACODER_N + 1; i++) {
state.cdf.push(i);
}
}

function encodeArithAdaptStep(input, state, output) {

var range = state.high - state.low + 1;

state.high = state.low + (range * state.cdf[input + 1]) / state.cdf[state.ACODER_N] - 1;
state.low  = state.low + (range * state.cdf[input])     / state.cdf[state.ACODER_N];

while (1) {
if (state.high < state.HALF) {
encodeArithAdapt_output_bit_and_pending(state, output, 0);
} else if (state.low >= state.HALF) {
encodeArithAdapt_output_bit_and_pending(state, output, 1);
state.low  -= state.HALF;
state.high -= state.HALF;
} else if (state.low >= state.FIRST_QTR && state.high < state.THIRD_QTR) {
state.pending++;
state.low  -= state.FIRST_QTR;
state.high -= state.FIRST_QTR;
} else
break;

state.low  = state.low * 2;
state.high = state.high * 2 + 1;
}

// model_update(input)
//
// function model_update(input) {
var last = 0;
var value = 0;

if (state.cdf[state.ACODER_N] == state.MAX_SCALE) {
for (var i = 0; i < state.ACODER_N; i++) {
value = state.cdf[i + 1] - last;
last = state.cdf[i + 1];
state.cdf[i + 1] = state.cdf[i] + (value + 1) / 2;
}
}

for (var i = input; i < state.ACODER_N; i++) {
state.cdf[i + 1]++;
}
// }

}

function encodeArithAdapt_output_bit_and_pending(state, output, bit) {
output_bit(bit);
for (; state.pending; state.pending--) {
output_bit(!bit);
}

function output_bit(value) {
state.byte |= (value << state.bit);

if (8 == ++state.bit) {
output.push(state.byte);
state.byte = 0;
state.bit = 0;
}
}
}

function encodeArithAdaptEnd(state, output) {
state.pending++;

if (state.low < state.FIRST_QTR)
encodeArithAdapt_output_bit_and_pending(state, output, 0);
else
encodeArithAdapt_output_bit_and_pending(state, output, 1);

output.push(state.byte);
}

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Re: Arithmetic Decoding for Embedded
« Reply #16 on: November 09, 2020, 06:48:12 am »
for BW image like your font data, you can do "static" bitrate encoding and achieve like 1-3bits codelength if you want to include antialiasing feature (87% - 62% compression rate) decoder will take only few lines of code. RLE can probably achieve much more and decoder is just as well small. you get bonus your code will become much more readable and maintainable. ymmv.
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 5086
  • Country: nz
Re: Arithmetic Decoding for Embedded
« Reply #17 on: November 09, 2020, 09:46:29 am »
Huh, surprised the C code didn't compress better.  Wouldn't have thought the per-character entropy was that high?

Huffman is good when the symbol probabilities are not vastly different -- and in particular when the most common symbol occurs much less than half the time -- and when also nearby symbols are uncorrelated.

It's fundamental problem is you can't ever use less then 1 bit per symbol.

Arithmetic coding can improve on that, but it still takes no advantage of serial correlation.

e.g. program code is full of "int", "char", "\n\tfor (int ", "\n\t\t} else {\n\t\t" etc.  In something like "\n\tfor (int " you've got 11 characters including 10 different characters. They are necessarily going to take on average at least 3-4 bits each character, so something like 40 bits compressed instead of 88 bits.

An LZ77 / LZW etc style compressor can use a single symbol for all 11 characters, and the symbol might only be 9 or 10 or 11 bits long. That offers much better compression if the input characters are correlated. If the input characters are uncorrelated (random), then the compression ratio on sufficiently long text approaches Huffman or arithmetic coding anyway.

lz4 does a pretty decent job of compression, while having an extremely simple and fast decoder, even on an 8 bit CPU.
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Re: Arithmetic Decoding for Embedded
« Reply #18 on: November 09, 2020, 11:58:45 am »
Huh, surprised the C code didn't compress better.  Wouldn't have thought the per-character entropy was that high?
Huffman is good when the symbol probabilities are not vastly different -- and in particular when the most common symbol occurs much less than half the time -- and when also nearby symbols are uncorrelated.
no. as with any other data compression schemes, they all will exploit probabilities difference, more "vastly different" is the better. symbols that occur most, will be represented with less bit length, more probability occurence, the lesser bit length is used, hence the more compression rate due to more "short bits" codes spitted out of encoder. Huffman is just an algorithm to find "optimum" code lengths based on given symbols' probability distribution, if probabilities of symbols are not vastly different, as they become equal, encoded code length will reach log2(N), ie 8 bits symbols will be represented with 8 bits code length (compression 0%) btw there are 2 types of code-length generation (encoded output symbols), static and adaptive. static is used when we can expect data or symbols to have pretty much the same (not far off) probability from precalculated average. adaptive is used when probability distribution is keep changing as the data comes. adaptive will be used for general purpose data such as ZIP/LZW. afaik Arithmetic scheme is much more computational heavy for very little gain.
« Last Edit: November 09, 2020, 12:05:01 pm by Mechatrommer »
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #19 on: November 09, 2020, 12:02:50 pm »
Hmm, looking at typical code of my style, I see a "palette" of 91 values, 14.2kB encoded in 9.9kB (adaptive arithmetic encoder).

Entropy: 5.566 bits/symbol

Frequency, sorted by value (aside from some examples, I'll leave it to you to guess which ones are which ::) ):
Code: [Select]
184, 838, 493, 496, 1387, 3, 2, 44,
2, 249, 249, 109, 43, 89, 72, 119,
294, 303, 125, 81, 19, 34, 38, 64,
22, 82, 12, 6, 252, 24, 128, 44,
5, 12, 65, 35, 128, 122, 70, 33,
21, 48, 116, 10, 48, 27, 18, 94,
153, 186, 192, 167, 36, 7, 86, 7,
11, 19, 19, 239, 347, 116, 201, 352,
697, 172, 177, 155, 481, 7, 36, 183,
76, 465, 371, 214, 3, 427, 348, 721,
153, 27, 97, 242, 102, 14, 46, 26,
46, 16, 1

The first few symbols are 0 (an artifact of how I tested this, ignore them), 9*, 10, 13 and 32.  Also last symbol is 255 (also an artifact).  Hmm, really weird that 10 and 13 don't match, I wonder why.

*I use tabs so I must get paid less for coding.  Actually I don't get paid at all for it! :P

An even distribution would see 153 occurrences per symbol; more than half the symbols are less common than that, and the top 13 symbols account for about half the total content.

Heh, and this file is probably on the easier side, it happens to contain a lot of repeated function calls (setting up peripheral registers).  So it's kind of average C, kind of not.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #20 on: November 09, 2020, 12:05:47 pm »
no. as with any other data compression schemes, they all will exploit probabilities difference, more "vastly different" is the better. symbols that occur most, will be represented with less bit length, more probability occurence, the lesser bit length is used, hence the more compression rate due to more "short bits" codes spitted out of encoder. Huffman is just an algorithm to find "optimum" code lengths based on given symbols' probability distribution, ...

Not very "optimum", it's a special case of arithmetic encoding, which amounts to a binary decision tree; which is why it can only encode fixed numbers of bits, and why it cannot be optimum in general.  But this has already been covered, either earlier in this thread or in the references I provided. :-+

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Re: Arithmetic Decoding for Embedded
« Reply #21 on: November 09, 2020, 12:20:46 pm »
i was replying on "probability distribution" matter as you simply put "entropy". huffman as implemented in PC will not be optimum since we cant generate code with bit length say 3.1416 bits ;) we know that. and yes, huffman is the "integer" or "fast" version of arithmetic encoding afaik. (much like a FFT to real FT)
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline tszaboo

  • Super Contributor
  • ***
  • Posts: 8509
  • Country: nl
  • Current job: ATEX product design
Re: Arithmetic Decoding for Embedded
« Reply #22 on: November 09, 2020, 12:33:28 pm »
For small micros, decoding images and showing it on a small screen, I've used RLE in the past. It was quite successful. Since most images that I wanted to show were icons, with a very few colors, so the results were quite OK. As a plus, we didnt need to read more than a few bytes, so the memory footprint was quite small. Also, we changed the graphich library, with custom draw functions, and draw windows, so it was super fast to draw to the screen. The screen was connected to the parrallel bus, so instead of setting it up, we ust toggled the write pin, so the draw times were really fast.
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 5086
  • Country: nz
Re: Arithmetic Decoding for Embedded
« Reply #23 on: November 09, 2020, 01:12:34 pm »
Hmm, looking at typical code of my style, I see a "palette" of 91 values, 14.2kB encoded in 9.9kB (adaptive arithmetic encoder).

Entropy: 5.566 bits/symbol

Not your file (obviously) as I don't have it, but I just grabbed a random 14.3kB file "linux/fs/ecryptfs/miscdev.c"

14316 raw : 8.00 bits/char
 6436 lz4 : 3.60 bits/char
 5452 lz4 -9 : 3.05 bits/char
 4129 gzip : 2.31 bits/char
 4118 gzip -9 : 2.30 bits/char
 4040 xz : 2.26 bits/char
 4025 bzip2 : 2.25 bits/char

lz4 doesn't give as good compression as the others but it's very very simple and I think it's going to be much better and much faster than Huffman or arithmetic encoding.
« Last Edit: November 09, 2020, 01:18:01 pm by brucehoult »
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Re: Arithmetic Decoding for Embedded
« Reply #24 on: November 09, 2020, 01:23:10 pm »
those that can perform better than symbols' entropy based guaranteed to have 1) adaptive probability distribution monitoring 2) encoding scheme such as huffman or arithmetic 3) table (string) based and 4) fancy tree splicing, rotating etc, means... much more complex than any of huffman or arithmetic alone. the more they can squeeze data, the more fancy they are guaranteed.
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline brucehoult

  • Super Contributor
  • ***
  • Posts: 5086
  • Country: nz
Re: Arithmetic Decoding for Embedded
« Reply #25 on: November 09, 2020, 01:49:33 pm »
those that can perform better than symbols' entropy based guaranteed to have 1) adaptive probability distribution monitoring 2) encoding scheme such as huffman or arithmetic 3) table (string) based and 4) fancy tree splicing, rotating etc, means... much more complex than any of huffman or arithmetic alone. the more they can squeeze data, the more fancy they are guaranteed.

lz4 streams consist of two things, strictly alternating:

1) copy N literal bytes to the output
2) go back X bytes in the previous output and copy the following Y bytes to the output [1]

That's *it*. None of that other fancy stuff you mention. The decompression code is completely trivial.

Of course that means it needs to keep a circular buffer of the most recent bytes output somewhere it can have random access to it. You can decide when you do the compression exactly how large a buffer you can afford in the decoding device.

[1] note: Y can be greater than X. This reduces to RLE. e.g. output literal "x", go back 1 byte and copy 999 bytes => output 1000 x's.
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Re: Arithmetic Decoding for Embedded
« Reply #26 on: November 09, 2020, 03:58:31 pm »
ring/sliding buffer is the simplest implementation of table/dictionary based compression, sort of brute force or "wish me luck" technique. that move by Y or X is the simplest tree splicing,look up, or whatever rotation/mirror it is. dictionary based in its simplest form can significantly increase compression rate though.
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline NorthGuy

  • Super Contributor
  • ***
  • Posts: 3325
  • Country: ca
Re: Arithmetic Decoding for Embedded
« Reply #27 on: November 10, 2020, 05:38:25 pm »
Look at the existing formats.

GIF uses LZW which is extremely easy to encode and decode and gives decent compression for graphics.

PNG uses DEFLATE (same as ZIP or gz) which is LZ77 encoded with Huffman. You already know Huffman, so this part will be easy. The algorithm is not deterministic - different encodings of the same source are possible. So the encoder may work hard to find the best compression, but it's still the same effort to decompress. This means that you can get a ZIP program, let it run in "highest compression/lowest speed" mode and compress your characters to the greatest degree. 7z can do this well. You can then extract the compressed characters from ZIP/gz file and use them in your program. Decompression is still easy. There's an RFC for DEFLATE with all the details.
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #28 on: November 19, 2020, 01:24:50 am »
Hmmm, continuing work on my encoder webpage, adding a decode verification step -- seem to be having issues.  Not entirely sure it's with the codec, getting weird results from my Huffman code too.  (The packed and unpacked "codecs" work just fine, but they're almost trivial, I'm not surprised.)

Here is the ported code:

- It's structured much the same as the original, one decoded byte in/out per call, encoded channel output/input on demand.  In this case the functions are packed into objects to easily swap them out; just imagine this was Java or C++ with an interface implemented by classes. :P
- Encoding seems to work, I didn't check if the arithmetic coding is bit-accurate to sample data (or to the original C, which I suppose I should just get off my ass and compile natively already..), but my Huffman code at least is.  So I seem to be having decode errors there, and it may be the same error for both?
- I'm particularly suspicious of the initializer step here (copied from the original; note "coder" easily ports to "state", and "->" to "."),
Code: [Select]
for (int i = 0; i < 16; i++)
      coder->value = (coder->value << 1) | input_bit(coder);

which unconditionally reads in 16 bits of input data, which seems dangerous when the stream is especially short.  And indeed, in the per-row per-character packing, I'm seeing encoded streams as short as two bytes!  Should this be considered a bug?

Code: [Select]
/**
 * Encoder/decoder function pointers.
 * Assign a set of start (initializer), step (input a byte, outputs
 * a byte whenever) and end (finalizer, flush, etc.) functions to
 * implement an encoding method.
 *
 * Properties:
 *
 * encodeStart(state, symbols)
 * decodeStart(state, symbols)
 * Initializes state.
 * state: the state vector.  An empty Object is passed in.
 * Suggest assigning properties to it.
 * symbols: an array of {val, freq} objects.
 * val: palette index / pixel value
 * freq: the number of times it occurs in the image
 * Dividing by the total gets PDF, incrementally summing
 * them gets CDF.  Use to calculate weights, entropy, etc.
 * Note that these data must be outputted with the encoded data,
 * for any encoder that uses them.
 *
 * encodeStep(input, state, output)
 * decodeStep(input, state)
 * Encoder: processes one symbol, or input byte, using and mutating
 * existing state, optionally outputting byte(s) to the output
 * array using a output.push() call.
 * Decoder: input is an iterator, to draw bytes from as needed.
 * Returns one output byte/symbol per call.
 *
 * encodeEnd(state, output)
 * Finalizes the encoder state, flushing any remaining data
 * to the output.
 * decodeEnd is not needed: exactly the dimensions of the
 * image are Step'd over, leaving no extra data.
 */
const codecArithAdapt = {
encodeStart: function(state, symbols) {
state.TOP_VALUE = 0xffff;
state.MAX_SCALE = 0x3fff;
state.FIRST_QTR = (Math.floor(state.TOP_VALUE / 4) + 1);
state.HALF = (2 * state.FIRST_QTR);
state.THIRD_QTR = (3 * state.FIRST_QTR);
state.ACODER_N = symbols.length;
state.low = 0;
state.high = state.TOP_VALUE;
state.pending = 0;
state.byte = 0;
state.bit = 0;
state.cdf = [];
for (var i = 0; i < state.ACODER_N + 1; i++) {
state.cdf.push(i);
}
},

encodeStep: function(input, state, output) {
var range = state.high - state.low + 1;
state.high = Math.floor(state.low + (range * state.cdf[input + 1]) / state.cdf[state.ACODER_N] - 1);
state.low  = Math.floor(state.low + (range * state.cdf[input])     / state.cdf[state.ACODER_N]);
while (1) {
if (state.high < state.HALF) {
codecArithAdapt_output_bit_and_pending(state, output, 0);
} else if (state.low >= state.HALF) {
codecArithAdapt_output_bit_and_pending(state, output, 1);
state.low  -= state.HALF;
state.high -= state.HALF;
} else if (state.low >= state.FIRST_QTR && state.high < state.THIRD_QTR) {
state.pending++;
state.low  -= state.FIRST_QTR;
state.high -= state.FIRST_QTR;
} else
break;
state.low  = state.low * 2;
state.high = state.high * 2 + 1;
}
codecArithAdapt_model_update(state, input);
},

encodeEnd: function(state, output) {
state.pending++;
if (state.low < state.FIRST_QTR)
codecArithAdapt_output_bit_and_pending(state, output, 0);
else
codecArithAdapt_output_bit_and_pending(state, output, 1);
output.push(state.byte);
},

decodeStart: function(state, symbols) {

state.TOP_VALUE = 0xffff;
state.MAX_SCALE = 0x3fff;
state.FIRST_QTR = (Math.floor(state.TOP_VALUE / 4) + 1);
state.HALF = (2 * state.FIRST_QTR);
state.THIRD_QTR = (3 * state.FIRST_QTR);
state.ACODER_N = symbols.length;
state.low = 0;
state.high = state.TOP_VALUE;
state.value = 0;
state.bit = 7;
state.firstRun = true;

state.cdf = [];
for (var i = 0; i < state.ACODER_N + 1; i++) {
state.cdf.push(i);
}
},

decodeStep: function(input, state) {

// Moved from decodeStart because it's not passed any input
if (state.firstRun) {
for (var i = 0; i < 16; i++)
state.value = (state.value << 1) | codecArithAdapt_input_bit(state, input);
state.firstRun = false;
}

var range = state.high - state.low + 1;
var val = Math.floor(((state.value - state.low + 1) * state.cdf[state.ACODER_N] - 1) / range);
var b;
for (b = state.ACODER_N; val < state.cdf[b]; b--);
state.high = Math.floor(state.low + (range * state.cdf[b + 1]) / state.cdf[state.ACODER_N] - 1);
state.low  = Math.floor(state.low + (range * state.cdf[b])     / state.cdf[state.ACODER_N]);

while (1) {
if (state.high < state.HALF) {
// Do nothing
} else if (state.low >= state.HALF) {
state.value -= state.HALF;
state.low   -= state.HALF;
state.high  -= state.HALF;
} else if (state.low >= state.FIRST_QTR && state.high < state.THIRD_QTR) {
state.value -= state.FIRST_QTR;
state.low   -= state.FIRST_QTR;
state.high  -= state.FIRST_QTR;
} else {
break;
}
state.low   = state.low * 2;
state.high  = state.high * 2 + 1;
state.value = (state.value << 1) | codecArithAdapt_input_bit(state, input);
}
codecArithAdapt_model_update(state, b);
return b;

}
};
// Helper functions
function codecArithAdapt_output_bit_and_pending(state, output, bit) {
codecArithAdapt_output_bit(state, output, bit);
for (; state.pending; state.pending--) {
codecArithAdapt_output_bit(state, output, !bit);
}
}

function codecArithAdapt_output_bit(state, output, value) {
state.byte |= (value << state.bit);
if (8 == ++state.bit) {
output.push(state.byte);
state.byte = 0;
state.bit = 0;
}
}

var consumedBytes = 0;

function codecArithAdapt_input_bit(state, input) {
var res;

state.bit++;
if (8 == state.bit) {
state.byte = input.next().value[1];
consumedBytes++;
state.bit = 0;
}

res = state.byte & 1;
state.byte >>>= 1;

return res;
}

function codecArithAdapt_model_update(state, input) {
var last = 0;
var value = 0;

if (state.cdf[state.ACODER_N] == state.MAX_SCALE) {
for (var i = 0; i < state.ACODER_N; i++) {
value = state.cdf[i + 1] - last;
last = state.cdf[i + 1];
state.cdf[i + 1] = state.cdf[i] + (value + 1) / 2;
}
}

for (var i = input; i < state.ACODER_N; i++) {
state.cdf[i + 1]++;
}
}



Typical usage is like so; variables should be fairly self-explanatory?

Code: [Select]
...
var codec = codecArithAdapt;
var encodedChars = [];
...
// Blocks, column first
for (var i = 0; i < charStarts.length; i++) {
encodedChars.push([]);
var encoderState = new Object();
codec.encodeStart(encoderState, palette);
for (var x = 0; x < charWidths[i]; x++) {
for (var y = 0; y < imgHeight; y++) {
// encodeStep(input, state, output)
codec.encodeStep(imgData[x + charStarts[i] + y * imgWidth], encoderState, encodedChars[i]);
}
}
codec.encodeEnd(encoderState, encodedChars[i]);
}
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 12012
  • Country: us
    • Personal site
Re: Arithmetic Decoding for Embedded
« Reply #29 on: November 19, 2020, 01:42:26 am »
- I'm particularly suspicious of the initializer step here (copied from the original; note "coder" easily ports to "state", and "->" to "."),
Code: [Select]
for (int i = 0; i < 16; i++)
      coder->value = (coder->value << 1) | input_bit(coder);
which unconditionally reads in 16 bits of input data, which seems dangerous when the stream is especially short.  And indeed, in the per-row per-character packing, I'm seeing encoded streams as short as two bytes!  Should this be considered a bug?

This should not be an issue as long as input_bit() returns 0 past the end of stream and does not read past the actual input. It is hard to have encoding that are less than 2 bytes anyway. The only one I could fins is the empty input, and that encoded into 2 bytes. Anything else is at least two bytes. This assumes you are rounding things to bytes, not bits.

I personally don't consider stuff like this a bug. Otherwise you would need to know the length of the input data, which must be present in the stream, further increasing the size and complicating parsing.
Alex
 

Offline amyk

  • Super Contributor
  • ***
  • Posts: 8627
Re: Arithmetic Decoding for Embedded
« Reply #30 on: November 19, 2020, 02:21:05 am »
Hint - look at MQ/QM-coder flowchart in JBIG/JBIG2/JPEG2000 standard - you will find simple and very effective arithmetic-like algorithm that does not use division nor floating-point (:o), and you can adapt it with a smaller state table to trade off compression for memory usage. Your code looks like an order-0 model.
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #31 on: November 19, 2020, 02:27:10 am »
Oh! So the fixed arrays in your example, are not merely convenience but a necessary feature?  Well that would matter...

(Insert the usual back-and-forth of, statically allocated arrays are always initialized to zero, and malloc is supposed to too, but what about stack allocations, or compiler differences, and other arcanum of C...)

Which also means the unchecked size is a trivial buffer exploit, but, it is anyway, with both input and output being completely unchecked; it's a demo.


I personally don't consider stuff like this a bug. Otherwise you would need to know the length of the input data, which must be present in the stream, further increasing the size and complicating parsing.

So, your example is fixed length, as is mine.  It should work all the same.

But that's also against your own recommendation, I guess?  So then, but, why is it implemented that way..?

(Again, I need to know offset and length anyway, to look up all these little pieces I'm making.  A termination symbol wouldn't benefit me much.)

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 12012
  • Country: us
    • Personal site
Re: Arithmetic Decoding for Embedded
« Reply #32 on: November 19, 2020, 02:37:04 am »
Oh! So the fixed arrays in your example, are not merely convenience but a necessary feature?  Well that would matter...
No, arrays are just a quick and dirty test.

If you want to add the size limit then it is on you to check that in decoder_callback(). If it detects that a read past end of input happens, it should return 0 instead.

There may not even be arrays, you can just read the stream sequentially from the SPI flash or download from the internet. The main code does not care.

(Insert the usual back-and-forth of, statically allocated arrays are always initialized to zero, and malloc is supposed to too, but what about stack allocations, or compiler differences, and other arcanum of C...)
Global variables are initialized to 0, it is a standard. If your compiler does not follow standards - I don't care.

Which also means the unchecked size is a trivial buffer exploit, but, it is anyway, with both input and output being completely unchecked; it's a demo.
It is on you to check user inputs.

In all my libraries I follow the approach that library relies on correct checked inputs. This way there are not 100 checks for the same thing. Only the outermost layer responsible for the interaction with the outside world does the checks. All internal APIs expect correct data.

It is also on you to check that the output does not overflow. It is easy to corrupt the input that will result in an infinite output. If your application is sensitive to it - include the unpacked size into the file format.

This library by design expects a raw stream of data.

But that's also against your own recommendation, I guess?  So then, but, why is it implemented that way..?
The demo is just a quick demo of the APIs.

I don't release well-documented libraries, ever. I release raw code in a way I found it to be useful. If you like it - take it, if you don't like it - leave it alone. I'm not spending any more time on stuff than I already had to.
Alex
 

Offline mariush

  • Super Contributor
  • ***
  • Posts: 5185
  • Country: ro
  • .
Re: Arithmetic Decoding for Embedded
« Reply #33 on: November 19, 2020, 05:21:01 am »
imho arithmetic coding is overkill for this.

just playing a bit with the lucida picture, i changed it to 12 x 560 so that width and height are multiples of 2,which means you have tiles of 2x2 and then i simply built a "dictionary" from them ...  out of 1680 tiles around 16 tiles total around 950 occurances

So you could probably have some kind of hybrid encoding where you make a 16 or 32 tile dictionary, and use the last palette entry as "tile not in dictionary" and then have 1680x4 bit , followed by the remaining tiles tiles.

In the picture above, we have 543 unique tiles and we have a 6 x 280 tile set  = 1680 tiles x 2 bytes per tile = 3360 bytes

If you prepare a 15 tile "dictionary" where the 15 most common tiles show up 950 times (not the actual number, but close enough) then the picture is already reduced to :
15 x 2 bytes  = 30 bytes  + 1680 x 4 bit = 840 bytes  + (1680-950) x 2 bytes = 1460 bytes  ... so total = 2330 bytes 
or if you add a bit map (1 bit if tile is in dictionary or not)

15x 2 bytes = 30 bytes + 1680/8 = 210 bytes + 950 tiles x 4 bit =  475 bytes + (1680-950) x 2 bytes = 1460 bytes  => total =  2175 bytes
This can be further compressed with some basic LZW, LZ77, whatever ... 

some php code i made as i wrote this to illustrate

Code: [Select]
<?php

$dictionary 
= [];
$blocks = [];
$pic imagecreatefrompng(__DIR__ .'/picture.png');

for (
$y=0;$y<280;$y++) {
for ($x=0;$x<6$x++) {
$a imagecolorat($pic,$x,$y); // actually the index in color palette
$b imagecolorat($pic,$x+1,$y);
$c imagecolorat($pic,$x,$y+1);
$d imagecolorat($pic,$x+1,$y+1);

$block $a 4096 $b 256 $c 16 +$d;
//echo $block.' ';
if (isset($dictionary[$block])===FALSE$dictionary[$block]=[$a,$b,$c,$d,0];
$dictionary[$block][4]++;
}
}
$dictionary_sorted = [];
$i=0;
foreach (
$dictionary as $block => $value) {
$dictionary_sorted[$i] = $value;
$i++;
}
$sorted FALSE;
while (
$sorted==FALSE) {
$sorted=TRUE;
for ($i=0;$i<count($dictionary_sorted)-1;$i++) {
if ($dictionary_sorted[$i][4]<$dictionary_sorted[$i+1][4]) {
$temp $dictionary_sorted[$i];
$dictionary_sorted[$i] = $dictionary_sorted[$i+1];
$dictionary_sorted[$i+1] = $temp;
$sorted FALSE;
}
}
}

foreach (
$dictionary_sorted as $index => $pair){
echo $index.','.json_encode($pair)."\n";
};

?>



the output :

Code: [Select]
0,[9,9,9,9,526]
1,[9,9,0,0,83]
2,[0,0,9,9,69]
3,[9,0,9,0,60]
4,[0,9,0,9,45]
5,[9,9,9,0,33]
6,[9,0,9,9,33]
7,[0,9,9,9,14]
8,[9,9,0,9,13]
9,[0,0,0,9,11]
10,[9,9,9,5,10]
11,[9,3,9,9,10]
12,[9,9,9,3,9]
13,[9,9,9,7,8]
14,[9,9,3,9,8]
15,[3,9,9,9,8]
16,[9,9,10,0,8]
17,[9,9,9,11,8]
18,[9,11,9,9,7]
19,[9,13,9,9,6]
20,[0,0,9,0,6]
21,[9,12,9,9,5]
22,[9,9,9,15,5]
23,[9,9,5,10,5]
24,[9,2,9,9,5]
25,[4,0,9,9,5]
26,[9,4,9,0,5]
27,[0,2,9,9,4]
28,[7,9,9,9,4]
29,[13,0,9,9,4]
30,[9,9,0,10,4]
31,[0,10,9,9,4] 
[...]
« Last Edit: November 19, 2020, 05:34:44 am by mariush »
 

Offline Mechatrommer

  • Super Contributor
  • ***
  • Posts: 11714
  • Country: my
  • reassessing directives...
Re: Arithmetic Decoding for Embedded
« Reply #34 on: November 19, 2020, 09:08:09 am »
save your time from this "classical" algorithm and go directly to LZ** variants dictionary based OSS algorithm mentioned earlier if your mcu can take it. i thought i've mentioned this? nevermind, i mention it again and not anymore. sorry for the annoyance ;D
Nature: Evolution and the Illusion of Randomness (Stephen L. Talbott): Its now indisputable that... organisms “expertise” contextualizes its genome, and its nonsense to say that these powers are under the control of the genome being contextualized - Barbara McClintock
 

Offline ataradov

  • Super Contributor
  • ***
  • Posts: 12012
  • Country: us
    • Personal site
Re: Arithmetic Decoding for Embedded
« Reply #35 on: November 19, 2020, 05:07:39 pm »
save your time from this "classical" algorithm and go directly to LZ** variants dictionary based OSS algorithm mentioned earlier if your mcu can take it. i thought i've mentioned this? nevermind, i mention it again and not anymore. sorry for the annoyance ;D
In my case any dictionary-based algorithms would not work, since I was very memory-constrained. And I was not compressing any random files, I knew general size and compression ratio of the files and it was suitable.
Alex
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #36 on: November 20, 2020, 05:57:33 am »
For those still curious, here's a... the alpha version, let's say:
https://htmlpreview.github.io/?https://github.com/T3sl4co1l/fontenc/blob/master/font.html

Easiest way to demo, save the example image; or use the same thing from earlier in this thread, rotated; or the other example in the repo, https://github.com/T3sl4co1l/fontenc/blob/master/Lucidia_16px.png

Arithmetic DWORDs isn't implemented yet (it's the identity encoder) but all the others seem to be working. :-+

As I expected, the fixed-model arithmetic encoding does much better, especially on per-row grouping.  Adaptive encoding works great on large blocks -- yielding 1770 bytes for FontLucidia.png in column orientation.  But it does a pitiful 3232 bytes in per-rows.  Fixed does considerably better at 2137 bytes -- but my Huffman code still beats it at 2003 bytes.  On the larger Lucidia_16px.png, fixed does pull ahead (Huffman: 3819, Fixed: 3621).

So it's looking pretty clear that this brand of arithmetic encoding isn't very efficient on small blocks, let alone enough to justify the intensive math required.  And, despite having thrown together the Huffman code with less knowledge, it's proving to be quite formidable!

I still want to do the DWORDs method, and I have to decide if I want to do that with a continuation symbol (which will cost a lot of efficiency), or further research the bit- or byte-serial codes (with bit-stuffing or whatever).

Then I'll move on to dictionary methods, which should be easy enough to implement on top of this system.  Hmm, I'll have to add a second level of state, to maintain a dictionary between blocks -- that'll be a somewhat larger change but not a pain.

Tim
« Last Edit: November 20, 2020, 05:59:56 am by T3sl4co1l »
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #37 on: November 28, 2020, 06:58:09 am »
Anyway, finished the DWORD encoder, it's working, at least on test data.  Updated (052d8eb...).

As expected, it's awful, even on short data. :-BROKE

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #38 on: November 28, 2020, 11:45:01 am »
RLE went together fairly quickly.. I've got it beating arithmetic (fixed model) and Huffman a lot of the time.  Neat.  Think there's some left to squeeze out of it, as I can potentially fudge the runs, including some white or black pixels in runs of intermediate values (which are currently implemented as runs of verbatim values).

With these short data (rows) there's so little to encode, it might as well be zero... so I added a hack to make it zero.  :-/O  Not sure how well this will do on the embedded side (the decoder is getting more complex?), but it's interesting at least that I'm sometimes encoding entire rows in zero bits.  This works by, if on closing the block, the remaining buffer is all white, just terminate early.  Thus, entire rows can be stored in zero additional data.  (This of course still requires that information (data width), but again, I have to have that to facilitate array access, so this is a win.)

Which arguably also acts as a dictionary for certain special cases (all white), which nicely segues into the final tests: dictionary lookup.

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 

Offline coppice

  • Super Contributor
  • ***
  • Posts: 10289
  • Country: gb
Re: Arithmetic Decoding for Embedded
« Reply #39 on: November 28, 2020, 01:50:08 pm »
RLE went together fairly quickly.. I've got it beating arithmetic (fixed model) and Huffman a lot of the time.  Neat.  Think there's some left to squeeze out of it, as I can potentially fudge the runs, including some white or black pixels in runs of intermediate values (which are currently implemented as runs of verbatim values).

With these short data (rows) there's so little to encode, it might as well be zero... so I added a hack to make it zero.  :-/O  Not sure how well this will do on the embedded side (the decoder is getting more complex?), but it's interesting at least that I'm sometimes encoding entire rows in zero bits.  This works by, if on closing the block, the remaining buffer is all white, just terminate early.  Thus, entire rows can be stored in zero additional data.  (This of course still requires that information (data width), but again, I have to have that to facilitate array access, so this is a win.)

Which arguably also acts as a dictionary for certain special cases (all white), which nicely segues into the final tests: dictionary lookup.

Tim
If your images are very busy arithmetic coding alone should do well on its ownj. However, if the image is sparse (e.g. like most bilevel text page scan images) squeezing out the sparsity first brings big improvements. You'll see this in algorithms like JBIG. That analyses sparsity over a few pixel rows, then compresses the results with arithmetic coding. RLE is another approach.
 

Offline T3sl4co1lTopic starter

  • Super Contributor
  • ***
  • Posts: 22435
  • Country: us
  • Expert, Analog Electronics, PCB Layout, EMC
    • Seven Transistor Labs
Re: Arithmetic Decoding for Embedded
« Reply #40 on: November 28, 2020, 08:41:33 pm »
They are, in a sense -- the problem is the images are very small, so the edges are very busy, I guess you could say.

Unless there's a way to O(1) index into a sequence from various encoders, but I'm thinking that's not possible in general(?).

Tim
Seven Transistor Labs, LLC
Electronic design, from concept to prototype.
Bringing a project to life?  Send me a message!
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf