Author Topic: Compression of .data (initialized variables) section? (Read 1473 times)

T3sl4co1l · « **on:** October 05, 2019, 07:32:59 am »

Curious, has anyone ever done this? Seems likely... but it's also an impossible-to-search term.

I've found that possibly TI's CCS can do this?

Explanation --

When C is compiled, variables are allocated based on initialization. Uninitialized (actually default) values are placed in the .bss section, which is zeroed during init. .data is initialized with data from an array, which for embedded usually means copying a block of Flash to RAM. (Guessing for executable applications, it's just a section in the EXE file and the OS loads it automatically? PCs have tons of memory, so it's not an interesting problem there.) That means .data wastes twice the memory, in a sense, and in constrained applications it would be of interest to compress it. And a simple compression, like RLE, or Huffman (eh, maybe not so simple), or LZ something or other, would be able to deliver reasonable overall ratios at little cost in boot time. Even better if it can be compiler/linker integrated (e.g., the linker sorts variables by init value, facilitating RLE encoding).

But I'm guessing GCC for example can't do that?

Tim

ataradov · « **Reply #1 on:** October 05, 2019, 07:55:08 am »

Funny enough, I've seen it done. I don't really remember where. But it was compressed after the linking is done and them decompressed in the startup code. Compiler does not really need to be involved at all, although this would see much more use if compression was already integrated.

It is not worth it in general. That time it was done to ensure that devices already in the field could be updated or something like this.

Kleinstein · « **Reply #2 on:** October 05, 2019, 08:11:14 am »

In most cases I would expect the compression not worth it. The decompression needs extra code, especially if not already used. Quite often there are not that many initialized values that are not zero. Constant values could also directly access data from flash, though this can be complicated on some chips.

It may make sense if there is a kind of standard, short code for decompression, e.g. as a a common lib. So the first step would likely be having such a library.

Siwastaja · « **Reply #3 on:** October 05, 2019, 08:29:32 am »

Probably not done, because not needed.

It tends to happen that storage capacity >> RAM

For example, in microcontrollers, if you have, say, 16K of RAM, you tend to have maybe 128K of flash.

Or, on PC, if you have 4GB of RAM, you tend to have 320GB of SSD, and so on...

Even if you fill half of your entire RAM with initialized variables, it's still just a few percent of available storage.

Combined to the fact that .data often contains some hundred bytes - large tables tend to be loaded from disk or generated by the user application (after main() is called). In which case, using any application specific compression is trivial.

T3sl4co1l · « **Reply #4 on:** October 05, 2019, 08:31:08 am »

I'm not asking if it's practical. Just a curiosity if it's been done!

Tim

jhpadjustable · « **Reply #5 on:** October 05, 2019, 09:38:49 am »

According to some random PDF on the Internet, it's a standard feature of Green Hills' MULTI IDE's elxr linker and crt0 since no later than 2005, and can be turned on and off per section. The document is titled "MULTI: Building Applications for Embedded ARM"

I've never seen it done, but now I kinda want to try it...

SiliconWizard · « **Reply #6 on:** October 05, 2019, 06:28:17 pm »

I've not heard of that. Some compilers may support that as an extension. Never run into one.

I get the point though. Initial values are essentially constants stored in Flash(/NVM) and copied to the corresponding RAM variables at startup; the waste factor here is that you can't actually "access" those constants anymore after that, while they are still there in NVM. If you want to keep those values, you'll have to actually make 2 copies of them in RAM. Damn. On some targets (with specific extensions) you may be able to work around that, but that's essentially non-portable. Like, on some targets, declaring a variable "const" will automatically allocate it in Flash, and it will be read from Flash each time you access it. I've seen that on some 8/16-bitters (sometimes requiring a specific qualifier, sometimes not), but if you're using an ARM core or something like that, it's usually not possible, not that easily at least.

As to compression, if you wanted that to be done transparently, I guess you'd have to modify the linker. The decompression could probably be done in the "startup" code, which you can usually modify and tweak by hand.

ataradov · « **Reply #7 on:** October 05, 2019, 06:40:26 pm »

Ok, I remember now. When I heard about this, I got interested in the best compression method for this application. This immediately eliminated all the methods that rely on a big dictionary.
After some experimentation, I figured out that the best algorithm for that was algorithmic coding.
The smallest (code size) C implementation I could create was about 500 bytes. There is probably some space for further optimization there, but I doubt it will be less than 300 bytes.

I ran it on a few representative binaries I could find at a time and the compression ratio was such that it barely covered the size of the decoder. Some were better than others. It will obviously work much better if you have large sparse initialized structures or large chunks of real data, like pictures or fonts.

SiliconWizard · « **Reply #8 on:** October 05, 2019, 07:16:12 pm »

Yup. For that to be worthwhile, you typically need really a crapton of intialized data.

Which, unless your whole code is gigantic, usually means you have a few initialized variables that are big, like big arrays as you just said. In that case, if memory is at a premium, the simplest approach, instead of hoping to find a compiler that supports automatic compression, is to handle this kind of data yourself, and not as just initial values for variables. Store such data compressed as object files, have it all linked in a specific section (declared in your linker script) and decompress it directly from your code.

To store binary data directly as an object file, if your compiler uses binutils (which will cover all gcc variants and probably others), the linker "ld" can do this for you. (Obviously use the "ld" for your specific target, for instrance: arm-none-eabi-ld)

Code: [Select]

ld -r -b binary -o <object file> <binary file 1> [...]
You can use objdump to list the generated symbols:

Code: [Select]

objdump --syms <object file>
You'll get two symbols per binary file (start and end of data), typically of the form: _binary_<filename>_<fileext>_start and _binary_<filename>_<fileext>_end
Easy to use from C or assembly.

I've done that to store binary data in executables for instance. It's more elegant, and more versatile, than generating C code with large initialized arrays.

amyk · « **Reply #9 on:** October 05, 2019, 07:26:13 pm »

An LZ decompressor is tiny (dozens of bytes). I've seen it done on embedded systems that load into RAM and execute there, and practically all Linux kernels will be compressed.

There's also this: https://en.wikipedia.org/wiki/UPX

ataradov · « **Reply #10 on:** October 05, 2019, 07:35:49 pm »

Can we see LZ implementation that is dozens of bytes?

T3sl4co1l · « **Reply #11 on:** October 05, 2019, 08:11:27 pm »

As example, I have a text routine that's basically Huffman encoding, so it's efficient for sparse data (namely, graphics with a lot of background color, and few other colors -- fonts). It takes up 215 words in AVR, including a lot of calls to IO routines that obviously wouldn't be necessary for a memory init.

Also, 178 words for a somewhat more powerful decoder (an area-based RLE), so I don't think it's too far out to make a compact decoder.

The same build these functions are in, only uses 0x58 bytes of .data total, so it would indeed be a tough sell.

On a related note, .text could be compressed, too. With no RAM execution, there's no point in trying on the AVR, of course. It's relevant to everything else though; in fact I ran across this just the other day, a lot of N64 games used Yaz0 (similar to LZMA?) to expand data and code overlays into RAM. And compressed (and encoded, encrypted or obfuscated) code is not unfamiliar on PCs either, though probably not as common as it once was (malware aside).

Although on another note, since the build I mentioned above is using a fair amount of PROGMEM -- I wonder how it might compare using compression on that, instead of storing it raw. Pieces can be expanded to RAM as needed (into local function stack frames, heap?). That can be written into the code (probably with great pain..). Maybe some tools and macros can help automate that; unclear if it could ever be as, or more, syntactically convenient than PROGMEM already is. Which is already on the annoying side, so...

But I digress. Neat that some tried, or offer it. Not surprised it's uncommon, of course.

Tim

amyk · « **Reply #12 on:** October 05, 2019, 11:35:48 pm »

Quote from: ataradov on October 05, 2019, 07:35:49 pm

Can we see LZ implementation that is dozens of bytes?

http://cbloomrants.blogspot.com/2011/10/10-27-11-tiny-lz-decoder.html


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Compression of .data (initialized variables) section? (Read 1473 times)

T3sl4co1l

Compression of .data (initialized variables) section?

ataradov

Re: Compression of .data (initialized variables) section?

Kleinstein

Re: Compression of .data (initialized variables) section?

Siwastaja

Re: Compression of .data (initialized variables) section?

T3sl4co1l

Re: Compression of .data (initialized variables) section?

jhpadjustable

Re: Compression of .data (initialized variables) section?

SiliconWizard

Re: Compression of .data (initialized variables) section?

ataradov

Re: Compression of .data (initialized variables) section?

SiliconWizard

Re: Compression of .data (initialized variables) section?

amyk

Re: Compression of .data (initialized variables) section?

ataradov

Re: Compression of .data (initialized variables) section?

T3sl4co1l

Re: Compression of .data (initialized variables) section?

amyk

Re: Compression of .data (initialized variables) section?

Share me