Author Topic: More compression codecs (Read 3800 times)

T3sl4co1l · « **on:** November 25, 2021, 02:45:31 am »

This looks quite promising:
https://phoboslab.org/log/2021/11/qoi-fast-lossless-image-compression

I may port it into my compression test demo thingy and see how it fares on embedded-relevant things I've been playing with.

The small buffer, and streamed architecture, suggests it should perform very well indeed on highly constrained embedded platforms. Even in compression, given the excellent performance in both directions.

An indexed version could be made, I suppose, by simply packing the palette alongside a grayscale image. Hmm, I wonder how the palette can be arranged to minimize large differences in the resulting grayscale...

Which, I also wonder if a dither extension would be worthwhile; specific to very color-constrained cases of course, and probably wouldn't perform all that well in general (having to encode the period and duty cycle of the pattern, in addition to the two alternating colors; and such runs only working along relatively rare flat-dithered rows, whereas real artwork contains gradients and curves?). Meh, pixel art enthusiasts either have their own stuff anyway (I recall reading LucasArts had some storage method for them (though not necessarily compression?) back in the day... which means it should be part of ScummVM?), I digress.

Or changing the window to be 2D-aware, when that should prove valuable. (Not so much on embedded, where you might not even have a frame buffer, but when one is available, sure.) Wouldn't cover much area for the same number of pixels (like 32 x 2 or 16 x 4), unless the window can be grown (and profitably so), while still packing everything into a byte-wise encoding.

http://www.gfwx.org/
Not so simple (though still under 1kLOC, comments included), and not clear what the memory requirements are, but very much open and available, and excellent performance!

Tim

SiliconWizard · « **Reply #1 on:** November 25, 2021, 08:23:47 pm »

Gave QOI a shot. Tried it on a few images. Here are my first remarks:

- It works. At least could not make it fail so far. Both encoding and decoding.
- In terms of compression, it doesn't look as good as the author's benchmark makes it appear. Compared to libpng on the images I used, the resulting files are about 2 times as large as PNG.
- It's fast. That's for sure.
- I got a number of warnings while compiling it - thought that might be worth fixing. The QOI_READ_16 macro is an example of what should not be done. Can you guess what the problem is?

Code: [Select]

#define QOI_READ_16(B, P) ((B[P++] & 0xff) << 8 | (B[P++] & 0xff))

Siwastaja · « **Reply #2 on:** November 27, 2021, 07:44:18 am »

Argh, that piece of code alone makes me not quite trust any of the code.

(| makes no sequence point, so the order in which P++ statements are evaluated, is undefined. Such code is unlikely to work except by luck. Why the author does not compile with warnings enabled, or overlooks the warnings, is beyond me!)

This being said, I quite like the idea of writing quick, simple, maybe even "unscientific" image compressor algorithms. I did something similar nearly 2 decades ago, my idea was to make a video player on TI-83 calculator but the TI-83 part never finished, or even started... But I did the encoder/decoder parts on PC and was able to squeeze some very small video files. I had similar concept of pattern headers/tags. These simple strategies greatly benefit from reducing the number of colors, i.e., quantization. I used 8-level (3-bit) grayscale in my tests back then. In such special cases (low resolution, low number of colors), these algorithms can even perform better (look better given the same file size) than general purpose advanced algorithms (like MPEG-1 at the time, which I compared against).

SiliconWizard · « **Reply #3 on:** November 28, 2021, 05:52:15 pm »

Quote from: Siwastaja on November 27, 2021, 07:44:18 am

Argh, that piece of code alone makes me not quite trust any of the code.
(| makes no sequence point, so the order in which P++ statements are evaluated, is undefined. Such code is unlikely to work except by luck. Why the author does not compile with warnings enabled, or overlooks the warnings, is beyond me!)

I took a look at the code (other than the point above.) It's not extremely good, it's not bad. Apart from a couple points that would be an easy fix.
It's unfortunate that people would still not bother looking at warnings these days, and keep using constructs that are undefined behavior.

I ran Cppcheck on the code, and it did find the issue with QOI_READ_16 (and QOI_READ_32 which uses QOI_READ_16):

Quote

error: Expression '(bytes[p++]&0xff)<<8|(bytes[p++]&0xff)' depends on order of evaluation of side effects

It also found another issue: a resource leak at line 500 (in qoi_read): if memory allocation fails, the open file will never get closed.

So I would suggest rewriting the QOI_READ_* and QOI_WRITE_* macros and fixing the above. Both are simple fixes.

A last point is that both encoding and decoding use dynamic memory allocation. If you're going to use this in embedded projects, that's something to keep in mind, depending on your requirements. Modifying the code to support static allocation (and possibly using smaller buffer size) may not be all trivial.

Quote from: Siwastaja on November 27, 2021, 07:44:18 am

This being said, I quite like the idea of writing quick, simple, maybe even "unscientific" image compressor algorithms.

Although I guess I know what you mean, I wouldn't put it this way. There's nothing "unscientific" about designing simple algorithms, as long as they are correct. Just because they don't use any fancy maths doesn't mean they are not "scientific".

Also implemented some kind of 2D-RLE (but only partial in the second direction) a few years ago, for a project that required real-time compression and decompression of video using a small FPGA. It wasn't close to PNG in terms of compression, but was good enough using the specific kind of images I had to deal with, and could run at about 200 MHz with relatively few resources on a Spartan-6 LX9.

The interesting point with QOI is that it looks reasonably general-purpose. Although not as good as PNG, it does a pretty decent job on various types of images.

I wanted a rude username · « **Reply #4 on:** December 06, 2021, 08:24:27 am »

Edit: Restored image from mirror.

Smokey · « **Reply #5 on:** March 07, 2022, 11:38:46 pm »

I'm also looking for something to reduce the storage requirements of images/short-animation-clips for an embedded system. Raw bytes takes up a ton of FLASH.

I haven't actually run QOI code yet, but from the quick pass I took I don't think I have enough RAM in my embedded system to do a frame decode. With RGB565 (16bit/pixel), my display frame size is 320x240x2 = 135600 bytes, which I can't hold in RAM all at once. I'm going to need something that I can decode on the fly as I pull bytes out of FLASH. I was looking at LZ4 as an option, since that looks to support decoding a stream.

Am I wrong about the QOI decode needing to hold a full frame in RAM to decode?

SiliconWizard · « **Reply #6 on:** March 08, 2022, 12:17:13 am »

So, I took a look at the current state of QOI. Good news: the author has indeed fixed the "issues" I pointed out here. The code is cleaner.
https://github.com/phoboslab/qoi

Now looking at the encoding function, as it is, it requires the entire image to be loaded in memory and a full data block allocated for the output, but it should not be too difficult to modify it and turn it into a "streaming" compression tool with relatively small buffers. As far as I see, it's a rather simple and single-pass compression. I'm tempted to take the challenge. =)

Smokey · « **Reply #7 on:** March 08, 2022, 12:52:09 am »

Cool. I would be doing the encoding on PC side, so I'm not so concerned about resource limits there.
But the decoding would be happening on a Cortex-M0 pulling bytes from external FLASH over SPI and I just don't have that much working RAM to hold a whole frame.

SiliconWizard · « **Reply #8 on:** March 08, 2022, 02:29:43 am »

Decompression is even simpler (as usual), so it should be straightforward. In both cases, it's mainly a matter of keeping the current state in some struct.
The author lists many implementations of QOI actually - but none for streaming compression/decompression, as far as I've seen. The documentation of the format is there: https://qoiformat.org/
and should be enough to actually implement it any way we see fit.

Smokey · « **Reply #9 on:** March 14, 2022, 10:19:47 am »

I've been digging into this stuff a bit. I'm actually leaning towards heatshrink: https://github.com/atomicobject/heatshrink

It sounds like it was designed for low memory devices and can operate on a stream out of the box.

SiliconWizard · « **Reply #10 on:** March 14, 2022, 06:08:12 pm »

I've played with QOI some more.

While it performs rather well, and not too far from PNG for a whole range of images (including the author's test set

), it performs pretty poorly for some of them. In particular, I have images generated with complex textures, and the compression ratio falls to just about 1/2, while the PNG equivalent is almost 1/10 of that!

But! That wasn't the end of it. I looked for a simple and fast compression that I could apply to QOI (a bit in the same vein as PNG does some pre-filtering, and then uses - I think - 'deflate' to compress it). I found XLZ, which is a simple and fast LZ77 compression: https://github.com/banebyte5115/xlz . It does happen to complement QOI pretty well, and while both QOI+XLZ is still MUCH faster for encoding (and even decoding) than PNG, it does perform very well on a wider range of images than just QOI, and the cumulated code is just a few KBytes. The above images I mentioned, which were "tough" for QOI alone, get compressed BETTER than the highest-level PNG.

While QOI can easily be turned into a streaming compression and decompression, XLZ is another beast, though. As it's LZ77, it does require a significant amount of memory - basically, it requires a "sliding window". But you can always work around that for memory-limited applications by compressing data in smaller chunks. Might not be quite as efficient compression-wise, but it's workable.

So, I'm definitely considering chaining the two for some applications.
I suggest considering the source code in both projects as "reference implementations" (as the author of QOI states), rather than production-ready code, and thus suggest rewriting those with your own constraints, code style and coding rules, if they apply, for any serious project.

Smokey · « **Reply #11 on:** March 15, 2022, 12:06:07 am »

LZ4 looks to use a default block size of 64k, but I've seen claims it works down to 1024 with lower compression ratios. The LZ4 python bindings appear to ignore you when you try to set a block size less than 64k, so I gave up on that for the moment and I'm working with heatshrink since small blocks is upfront in the feature set.

SiliconWizard · « **Reply #12 on:** March 15, 2022, 01:44:18 am »

I've tried LZ4 before XLZ, but XLZ on QOI (I'm not talking about any data in general) works almost as well as LZ4, and is much simpler (and even faster).
Both will have the same kind of memory footprint and constraints, so using that depends on your target and requirements entirely. Probably not worth it for small targets, but otherwise interesting.

Haven't tested heatshrink. Could you give examples of compression ratios with it on example images and compare that to QOI alone?

T3sl4co1l · « **Reply #13 on:** March 23, 2023, 09:37:46 pm »

Necroposting:

A new, old algorithm for a change, has been published here: https://github.com/TheRealOrange/icer_compression

Also cross referencing these pages for convenience:
https://www.eevblog.com/forum/programming/slic-simple-lossless-imaging-codec/msg4723580/#msg4723580
https://www.eevblog.com/forum/microcontrollers/best-libraryalgorithm-for-low-ram-embedded-data-compression/
https://www.eevblog.com/forum/microcontrollers/arithmetic-decoding-for-embedded/
and a few other threads we've been through here that I recall but don't find at a glance.

Tim

Smokey · « **Reply #14 on:** March 24, 2023, 06:59:38 pm »

Cool. They mention "memory-constrained embedded systems". But they don't give hard numbers on that, and their examples uses a ton of memory.

From their example: https://github.com/TheRealOrange/icer_compression/blob/master/example/src/main.c

Code: [Select]

    
    const size_t out_w = 512;
    const size_t out_h = 512;

    uint8_t *resized = malloc(out_w*out_h);
    uint16_t *transform = malloc(out_w*out_h*2);
    uint16_t *compress = malloc(out_w*out_h*2);
    uint16_t *decompress = malloc(out_w*out_h*2);
    uint8_t *display = malloc(out_w*out_h);

So decompress needs 512*512*2 uint16_t? That's 1,048,576 bytes?


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: More compression codecs (Read 3800 times)

T3sl4co1l

More compression codecs

SiliconWizard

Re: More compression codecs

Siwastaja

Re: More compression codecs

SiliconWizard

Re: More compression codecs

I wanted a rude username

Re: More compression codecs

Smokey

Re: More compression codecs

SiliconWizard

Re: More compression codecs

Smokey

Re: More compression codecs

SiliconWizard

Re: More compression codecs

Smokey

Re: More compression codecs

SiliconWizard

Re: More compression codecs

Smokey

Re: More compression codecs

SiliconWizard

Re: More compression codecs

T3sl4co1l

Re: More compression codecs

Smokey

Re: More compression codecs

Share me