Arithmetic Decoding for Embedded

#25 Reply
Posted by brucehoult on 09 Nov, 2020 13:49
Quote from: Mechatrommer on 09 Nov, 2020 13:23
those that can perform better than symbols' entropy based guaranteed to have 1) adaptive probability distribution monitoring 2) encoding scheme such as huffman or arithmetic 3) table (string) based and 4) fancy tree splicing, rotating etc, means... much more complex than any of huffman or arithmetic alone. the more they can squeeze data, the more fancy they are guaranteed.

lz4 streams consist of two things, strictly alternating:

1) copy N literal bytes to the output
2) go back X bytes in the previous output and copy the following Y bytes to the output [1]

That's *it*. None of that other fancy stuff you mention. The decompression code is completely trivial.

Of course that means it needs to keep a circular buffer of the most recent bytes output somewhere it can have random access to it. You can decide when you do the compression exactly how large a buffer you can afford in the decoding device.

[1] note: Y can be greater than X. This reduces to RLE. e.g. output literal "x", go back 1 byte and copy 999 bytes => output 1000 x's.

#26 Reply
Posted by Mechatrommer on 09 Nov, 2020 15:58
ring/sliding buffer is the simplest implementation of table/dictionary based compression, sort of brute force or "wish me luck" technique. that move by Y or X is the simplest tree splicing,look up, or whatever rotation/mirror it is. dictionary based in its simplest form can significantly increase compression rate though.

#27 Reply
Posted by NorthGuy on 10 Nov, 2020 17:38
Look at the existing formats.

GIF uses LZW which is extremely easy to encode and decode and gives decent compression for graphics.

PNG uses DEFLATE (same as ZIP or gz) which is LZ77 encoded with Huffman. You already know Huffman, so this part will be easy. The algorithm is not deterministic - different encodings of the same source are possible. So the encoder may work hard to find the best compression, but it's still the same effort to decompress. This means that you can get a ZIP program, let it run in "highest compression/lowest speed" mode and compress your characters to the greatest degree. 7z can do this well. You can then extract the compressed characters from ZIP/gz file and use them in your program. Decompression is still easy. There's an RFC for DEFLATE with all the details.

#28 Reply
Posted by T3sl4co1l on 19 Nov, 2020 01:24
Hmmm, continuing work on my encoder webpage, adding a decode verification step -- seem to be having issues. Not entirely sure it's with the codec, getting weird results from my Huffman code too. (The packed and unpacked "codecs" work just fine, but they're almost trivial, I'm not surprised.)

Here is the ported code:

- It's structured much the same as the original, one decoded byte in/out per call, encoded channel output/input on demand. In this case the functions are packed into objects to easily swap them out; just imagine this was Java or C++ with an interface implemented by classes.
- Encoding seems to work, I didn't check if the arithmetic coding is bit-accurate to sample data (or to the original C, which I suppose I should just get off my ass and compile natively already..), but my Huffman code at least is. So I seem to be having decode errors there, and it may be the same error for both?
- I'm particularly suspicious of the initializer step here (copied from the original; note "coder" easily ports to "state", and "->" to "."),
Code: [Select]
for (int i = 0; i < 16; i++) coder->value = (coder->value << 1) | input_bit(coder);
which unconditionally reads in 16 bits of input data, which seems dangerous when the stream is especially short. And indeed, in the per-row per-character packing, I'm seeing encoded streams as short as two bytes! Should this be considered a bug?

Code: [Select]
/** * Encoder/decoder function pointers. * Assign a set of start (initializer), step (input a byte, outputs * a byte whenever) and end (finalizer, flush, etc.) functions to * implement an encoding method. * * Properties: * * encodeStart(state, symbols) * decodeStart(state, symbols) * Initializes state. * state: the state vector. An empty Object is passed in. * Suggest assigning properties to it. * symbols: an array of {val, freq} objects. * val: palette index / pixel value * freq: the number of times it occurs in the image * Dividing by the total gets PDF, incrementally summing * them gets CDF. Use to calculate weights, entropy, etc. * Note that these data must be outputted with the encoded data, * for any encoder that uses them. * * encodeStep(input, state, output) * decodeStep(input, state) * Encoder: processes one symbol, or input byte, using and mutating * existing state, optionally outputting byte(s) to the output * array using a output.push() call. * Decoder: input is an iterator, to draw bytes from as needed. * Returns one output byte/symbol per call. * * encodeEnd(state, output) * Finalizes the encoder state, flushing any remaining data * to the output. * decodeEnd is not needed: exactly the dimensions of the * image are Step'd over, leaving no extra data. */ const codecArithAdapt = { encodeStart: function(state, symbols) { state.TOP_VALUE = 0xffff; state.MAX_SCALE = 0x3fff; state.FIRST_QTR = (Math.floor(state.TOP_VALUE / 4) + 1); state.HALF = (2 * state.FIRST_QTR); state.THIRD_QTR = (3 * state.FIRST_QTR); state.ACODER_N = symbols.length; state.low = 0; state.high = state.TOP_VALUE; state.pending = 0; state.byte = 0; state.bit = 0; state.cdf = []; for (var i = 0; i < state.ACODER_N + 1; i++) { state.cdf.push(i); } }, encodeStep: function(input, state, output) { var range = state.high - state.low + 1; state.high = Math.floor(state.low + (range * state.cdf[input + 1]) / state.cdf[state.ACODER_N] - 1); state.low = Math.floor(state.low + (range * state.cdf[input]) / state.cdf[state.ACODER_N]); while (1) { if (state.high < state.HALF) { codecArithAdapt_output_bit_and_pending(state, output, 0); } else if (state.low >= state.HALF) { codecArithAdapt_output_bit_and_pending(state, output, 1); state.low -= state.HALF; state.high -= state.HALF; } else if (state.low >= state.FIRST_QTR && state.high < state.THIRD_QTR) { state.pending++; state.low -= state.FIRST_QTR; state.high -= state.FIRST_QTR; } else break; state.low = state.low * 2; state.high = state.high * 2 + 1; } codecArithAdapt_model_update(state, input); }, encodeEnd: function(state, output) { state.pending++; if (state.low < state.FIRST_QTR) codecArithAdapt_output_bit_and_pending(state, output, 0); else codecArithAdapt_output_bit_and_pending(state, output, 1); output.push(state.byte); }, decodeStart: function(state, symbols) { state.TOP_VALUE = 0xffff; state.MAX_SCALE = 0x3fff; state.FIRST_QTR = (Math.floor(state.TOP_VALUE / 4) + 1); state.HALF = (2 * state.FIRST_QTR); state.THIRD_QTR = (3 * state.FIRST_QTR); state.ACODER_N = symbols.length; state.low = 0; state.high = state.TOP_VALUE; state.value = 0; state.bit = 7; state.firstRun = true; state.cdf = []; for (var i = 0; i < state.ACODER_N + 1; i++) { state.cdf.push(i); } }, decodeStep: function(input, state) { // Moved from decodeStart because it's not passed any input if (state.firstRun) { for (var i = 0; i < 16; i++) state.value = (state.value << 1) | codecArithAdapt_input_bit(state, input); state.firstRun = false; } var range = state.high - state.low + 1; var val = Math.floor(((state.value - state.low + 1) * state.cdf[state.ACODER_N] - 1) / range); var b; for (b = state.ACODER_N; val < state.cdf[b]; b--); state.high = Math.floor(state.low + (range * state.cdf[b + 1]) / state.cdf[state.ACODER_N] - 1); state.low = Math.floor(state.low + (range * state.cdf[b]) / state.cdf[state.ACODER_N]); while (1) { if (state.high < state.HALF) { // Do nothing } else if (state.low >= state.HALF) { state.value -= state.HALF; state.low -= state.HALF; state.high -= state.HALF; } else if (state.low >= state.FIRST_QTR && state.high < state.THIRD_QTR) { state.value -= state.FIRST_QTR; state.low -= state.FIRST_QTR; state.high -= state.FIRST_QTR; } else { break; } state.low = state.low * 2; state.high = state.high * 2 + 1; state.value = (state.value << 1) | codecArithAdapt_input_bit(state, input); } codecArithAdapt_model_update(state, b); return b; } }; // Helper functions function codecArithAdapt_output_bit_and_pending(state, output, bit) { codecArithAdapt_output_bit(state, output, bit); for (; state.pending; state.pending--) { codecArithAdapt_output_bit(state, output, !bit); } } function codecArithAdapt_output_bit(state, output, value) { state.byte |= (value << state.bit); if (8 == ++state.bit) { output.push(state.byte); state.byte = 0; state.bit = 0; } } var consumedBytes = 0; function codecArithAdapt_input_bit(state, input) { var res; state.bit++; if (8 == state.bit) { state.byte = input.next().value[1]; consumedBytes++; state.bit = 0; } res = state.byte & 1; state.byte >>>= 1; return res; } function codecArithAdapt_model_update(state, input) { var last = 0; var value = 0; if (state.cdf[state.ACODER_N] == state.MAX_SCALE) { for (var i = 0; i < state.ACODER_N; i++) { value = state.cdf[i + 1] - last; last = state.cdf[i + 1]; state.cdf[i + 1] = state.cdf[i] + (value + 1) / 2; } } for (var i = input; i < state.ACODER_N; i++) { state.cdf[i + 1]++; } }

Typical usage is like so; variables should be fairly self-explanatory?

Code: [Select]
... var codec = codecArithAdapt; var encodedChars = []; ... // Blocks, column first for (var i = 0; i < charStarts.length; i++) { encodedChars.push([]); var encoderState = new Object(); codec.encodeStart(encoderState, palette); for (var x = 0; x < charWidths[i]; x++) { for (var y = 0; y < imgHeight; y++) { // encodeStep(input, state, output) codec.encodeStep(imgData[x + charStarts[i] + y * imgWidth], encoderState, encodedChars[i]); } } codec.encodeEnd(encoderState, encodedChars[i]); }

#29 Reply
Posted by ataradov on 19 Nov, 2020 01:42
Quote from: T3sl4co1l on 19 Nov, 2020 01:24
- I'm particularly suspicious of the initializer step here (copied from the original; note "coder" easily ports to "state", and "->" to "."),
Code: [Select]
for (int i = 0; i < 16; i++) coder->value = (coder->value << 1) | input_bit(coder);which unconditionally reads in 16 bits of input data, which seems dangerous when the stream is especially short. And indeed, in the per-row per-character packing, I'm seeing encoded streams as short as two bytes! Should this be considered a bug?

This should not be an issue as long as input_bit() returns 0 past the end of stream and does not read past the actual input. It is hard to have encoding that are less than 2 bytes anyway. The only one I could fins is the empty input, and that encoded into 2 bytes. Anything else is at least two bytes. This assumes you are rounding things to bytes, not bits.

I personally don't consider stuff like this a bug. Otherwise you would need to know the length of the input data, which must be present in the stream, further increasing the size and complicating parsing.

#30 Reply
Posted by amyk on 19 Nov, 2020 02:21
Hint - look at MQ/QM-coder flowchart in JBIG/JBIG2/JPEG2000 standard - you will find simple and very effective arithmetic-like algorithm that does not use division nor floating-point (), and you can adapt it with a smaller state table to trade off compression for memory usage. Your code looks like an order-0 model.

#31 Reply
Posted by T3sl4co1l on 19 Nov, 2020 02:27
Oh! So the fixed arrays in your example, are not merely convenience but a necessary feature? Well that would matter...

(Insert the usual back-and-forth of, statically allocated arrays are always initialized to zero, and malloc is supposed to too, but what about stack allocations, or compiler differences, and other arcanum of C...)

Which also means the unchecked size is a trivial buffer exploit, but, it is anyway, with both input and output being completely unchecked; it's a demo.

Quote from: ataradov on 19 Nov, 2020 01:42
I personally don't consider stuff like this a bug. Otherwise you would need to know the length of the input data, which must be present in the stream, further increasing the size and complicating parsing.

So, your example is fixed length, as is mine. It should work all the same.

But that's also against your own recommendation, I guess? So then, but, why is it implemented that way..?

(Again, I need to know offset and length anyway, to look up all these little pieces I'm making. A termination symbol wouldn't benefit me much.)

Tim

#32 Reply
Posted by ataradov on 19 Nov, 2020 02:37
Quote from: T3sl4co1l on 19 Nov, 2020 02:27
Oh! So the fixed arrays in your example, are not merely convenience but a necessary feature? Well that would matter...
No, arrays are just a quick and dirty test.

If you want to add the size limit then it is on you to check that in decoder_callback(). If it detects that a read past end of input happens, it should return 0 instead.

There may not even be arrays, you can just read the stream sequentially from the SPI flash or download from the internet. The main code does not care.

Quote from: T3sl4co1l on 19 Nov, 2020 02:27
(Insert the usual back-and-forth of, statically allocated arrays are always initialized to zero, and malloc is supposed to too, but what about stack allocations, or compiler differences, and other arcanum of C...)
Global variables are initialized to 0, it is a standard. If your compiler does not follow standards - I don't care.

Quote from: T3sl4co1l on 19 Nov, 2020 02:27
Which also means the unchecked size is a trivial buffer exploit, but, it is anyway, with both input and output being completely unchecked; it's a demo.
It is on you to check user inputs.

In all my libraries I follow the approach that library relies on correct checked inputs. This way there are not 100 checks for the same thing. Only the outermost layer responsible for the interaction with the outside world does the checks. All internal APIs expect correct data.

It is also on you to check that the output does not overflow. It is easy to corrupt the input that will result in an infinite output. If your application is sensitive to it - include the unpacked size into the file format.

This library by design expects a raw stream of data.

Quote from: T3sl4co1l on 19 Nov, 2020 02:27
But that's also against your own recommendation, I guess? So then, but, why is it implemented that way..?
The demo is just a quick demo of the APIs.

I don't release well-documented libraries, ever. I release raw code in a way I found it to be useful. If you like it - take it, if you don't like it - leave it alone. I'm not spending any more time on stuff than I already had to.

#33 Reply
Posted by mariush on 19 Nov, 2020 05:21
imho arithmetic coding is overkill for this.

just playing a bit with the lucida picture, i changed it to 12 x 560 so that width and height are multiples of 2,which means you have tiles of 2x2 and then i simply built a "dictionary" from them ... out of 1680 tiles around 16 tiles total around 950 occurances

So you could probably have some kind of hybrid encoding where you make a 16 or 32 tile dictionary, and use the last palette entry as "tile not in dictionary" and then have 1680x4 bit , followed by the remaining tiles tiles.

In the picture above, we have 543 unique tiles and we have a 6 x 280 tile set = 1680 tiles x 2 bytes per tile = 3360 bytes

If you prepare a 15 tile "dictionary" where the 15 most common tiles show up 950 times (not the actual number, but close enough) then the picture is already reduced to :
15 x 2 bytes = 30 bytes + 1680 x 4 bit = 840 bytes + (1680-950) x 2 bytes = 1460 bytes ... so total = 2330 bytes
or if you add a bit map (1 bit if tile is in dictionary or not)

15x 2 bytes = 30 bytes + 1680/8 = 210 bytes + 950 tiles x 4 bit = 475 bytes + (1680-950) x 2 bytes = 1460 bytes => total = 2175 bytes
This can be further compressed with some basic LZW, LZ77, whatever ...

some php code i made as i wrote this to illustrate

Code: [Select]
<?php $dictionary = []; $blocks = []; $pic = imagecreatefrompng(__DIR__ .'/picture.png'); for ($y=0;$y<280;$y++) { for ($x=0;$x<6; $x++) { $a = imagecolorat($pic,$x,$y); // actually the index in color palette $b = imagecolorat($pic,$x+1,$y); $c = imagecolorat($pic,$x,$y+1); $d = imagecolorat($pic,$x+1,$y+1); $block = $a * 4096 + $b * 256 + $c * 16 +$d; //echo $block.' '; if (isset($dictionary[$block])===FALSE) $dictionary[$block]=[$a,$b,$c,$d,0]; $dictionary[$block][4]++; } } $dictionary_sorted = []; $i=0; foreach ($dictionary as $block => $value) { $dictionary_sorted[$i] = $value; $i++; } $sorted = FALSE; while ($sorted==FALSE) { $sorted=TRUE; for ($i=0;$i<count($dictionary_sorted)-1;$i++) { if ($dictionary_sorted[$i][4]<$dictionary_sorted[$i+1][4]) { $temp = $dictionary_sorted[$i]; $dictionary_sorted[$i] = $dictionary_sorted[$i+1]; $dictionary_sorted[$i+1] = $temp; $sorted = FALSE; } } } foreach ($dictionary_sorted as $index => $pair){ echo $index.','.json_encode($pair)."\n"; }; ?>
the output :

Code: [Select]
0,[9,9,9,9,526] 1,[9,9,0,0,83] 2,[0,0,9,9,69] 3,[9,0,9,0,60] 4,[0,9,0,9,45] 5,[9,9,9,0,33] 6,[9,0,9,9,33] 7,[0,9,9,9,14] 8,[9,9,0,9,13] 9,[0,0,0,9,11] 10,[9,9,9,5,10] 11,[9,3,9,9,10] 12,[9,9,9,3,9] 13,[9,9,9,7,8] 14,[9,9,3,9,8] 15,[3,9,9,9,8] 16,[9,9,10,0,8] 17,[9,9,9,11,8] 18,[9,11,9,9,7] 19,[9,13,9,9,6] 20,[0,0,9,0,6] 21,[9,12,9,9,5] 22,[9,9,9,15,5] 23,[9,9,5,10,5] 24,[9,2,9,9,5] 25,[4,0,9,9,5] 26,[9,4,9,0,5] 27,[0,2,9,9,4] 28,[7,9,9,9,4] 29,[13,0,9,9,4] 30,[9,9,0,10,4] 31,[0,10,9,9,4] [...]

#34 Reply
Posted by Mechatrommer on 19 Nov, 2020 09:08
save your time from this "classical" algorithm and go directly to LZ** variants dictionary based OSS algorithm mentioned earlier if your mcu can take it. i thought i've mentioned this? nevermind, i mention it again and not anymore. sorry for the annoyance

#35 Reply
Posted by ataradov on 19 Nov, 2020 17:07
Quote from: Mechatrommer on 19 Nov, 2020 09:08
save your time from this "classical" algorithm and go directly to LZ** variants dictionary based OSS algorithm mentioned earlier if your mcu can take it. i thought i've mentioned this? nevermind, i mention it again and not anymore. sorry for the annoyance
In my case any dictionary-based algorithms would not work, since I was very memory-constrained. And I was not compressing any random files, I knew general size and compression ratio of the files and it was suitable.

#36 Reply
Posted by T3sl4co1l on 20 Nov, 2020 05:57
For those still curious, here's a... the alpha version, let's say:
https://htmlpreview.github.io/?https://github.com/T3sl4co1l/fontenc/blob/master/font.html

Easiest way to demo, save the example image; or use the same thing from earlier in this thread, rotated; or the other example in the repo, https://github.com/T3sl4co1l/fontenc/blob/master/Lucidia_16px.png

Arithmetic DWORDs isn't implemented yet (it's the identity encoder) but all the others seem to be working.

As I expected, the fixed-model arithmetic encoding does much better, especially on per-row grouping. Adaptive encoding works great on large blocks -- yielding 1770 bytes for FontLucidia.png in column orientation. But it does a pitiful 3232 bytes in per-rows. Fixed does considerably better at 2137 bytes -- but my Huffman code still beats it at 2003 bytes. On the larger Lucidia_16px.png, fixed does pull ahead (Huffman: 3819, Fixed: 3621).

So it's looking pretty clear that this brand of arithmetic encoding isn't very efficient on small blocks, let alone enough to justify the intensive math required. And, despite having thrown together the Huffman code with less knowledge, it's proving to be quite formidable!

I still want to do the DWORDs method, and I have to decide if I want to do that with a continuation symbol (which will cost a lot of efficiency), or further research the bit- or byte-serial codes (with bit-stuffing or whatever).

Then I'll move on to dictionary methods, which should be easy enough to implement on top of this system. Hmm, I'll have to add a second level of state, to maintain a dictionary between blocks -- that'll be a somewhat larger change but not a pain.

Tim

#37 Reply
Posted by T3sl4co1l on 28 Nov, 2020 06:58
Anyway, finished the DWORD encoder, it's working, at least on test data. Updated (052d8eb...).

As expected, it's awful, even on short data.

Tim

#38 Reply
Posted by T3sl4co1l on 28 Nov, 2020 11:45
RLE went together fairly quickly.. I've got it beating arithmetic (fixed model) and Huffman a lot of the time. Neat. Think there's some left to squeeze out of it, as I can potentially fudge the runs, including some white or black pixels in runs of intermediate values (which are currently implemented as runs of verbatim values).

With these short data (rows) there's so little to encode, it might as well be zero... so I added a hack to make it zero. Not sure how well this will do on the embedded side (the decoder is getting more complex?), but it's interesting at least that I'm sometimes encoding entire rows in zero bits. This works by, if on closing the block, the remaining buffer is all white, just terminate early. Thus, entire rows can be stored in zero additional data. (This of course still requires that information (data width), but again, I have to have that to facilitate array access, so this is a win.)

Which arguably also acts as a dictionary for certain special cases (all white), which nicely segues into the final tests: dictionary lookup.

Tim

#39 Reply
Posted by coppice on 28 Nov, 2020 13:50
Quote from: T3sl4co1l on 28 Nov, 2020 11:45
RLE went together fairly quickly.. I've got it beating arithmetic (fixed model) and Huffman a lot of the time. Neat. Think there's some left to squeeze out of it, as I can potentially fudge the runs, including some white or black pixels in runs of intermediate values (which are currently implemented as runs of verbatim values).

With these short data (rows) there's so little to encode, it might as well be zero... so I added a hack to make it zero. Not sure how well this will do on the embedded side (the decoder is getting more complex?), but it's interesting at least that I'm sometimes encoding entire rows in zero bits. This works by, if on closing the block, the remaining buffer is all white, just terminate early. Thus, entire rows can be stored in zero additional data. (This of course still requires that information (data width), but again, I have to have that to facilitate array access, so this is a win.)

Which arguably also acts as a dictionary for certain special cases (all white), which nicely segues into the final tests: dictionary lookup.

Tim
If your images are very busy arithmetic coding alone should do well on its ownj. However, if the image is sparse (e.g. like most bilevel text page scan images) squeezing out the sparsity first brings big improvements. You'll see this in algorithms like JBIG. That analyses sparsity over a few pixel rows, then compresses the results with arithmetic coding. RLE is another approach.

#40 Reply
Posted by T3sl4co1l on 28 Nov, 2020 20:41
They are, in a sense -- the problem is the images are very small, so the edges are very busy, I guess you could say.

Unless there's a way to O(1) index into a sequence from various encoders, but I'm thinking that's not possible in general(?).

Tim

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

Are you sure?

There was an error while thanking

Thanking...

Go to page:

« 1 2 All

Full site Menu

Navigation

Powered by SMFPacks Advanced Attachments Uploader Mod