Author Topic: Lossless compresion for a sparse audio stream (Read 3777 times)

RoGeorge · « **on:** August 25, 2021, 09:24:02 am »

What compression algorithm/library to use for a mono 192kHz/16bits audio stream?

- the compression must be lossless
- has to be made on-the-fly (the DSP used, C5535, has only 320kB of RAM)
- preferably a FOSS algorithm/library

If this info is of any help, the audio stream is typically a sparse signal (echolocation signals from bats), with a series of short bursts of 3-5ms long, followed by long periods of silence (sometimes the bursts are up to 20-50ms long, depending of species).

SiliconWizard · « **Reply #1 on:** August 25, 2021, 05:32:17 pm »

Just a thought. Instead of trying to get a fancy algorithm to fit on your DSP, why not use the following scheme instead? If it's a sparse signal, then you can just implement a 'noise gate' (reasonable approach: mvoing RMS, thresholding and a 'release' time to avoid a 'pumping' effect). Then below the set threshold, you would generate no data - just a single code meaning "silence". Once it gets above the threshold, you can just encode the audio data verbatim - if compression this way is already enough - or further encode it with a simple RLE algorithm.

A drawback of this noise gate approach is that it's not going to be "robust" against any unwanted noise. I dunno if that's going to happen in your use case. But you could mitigate this by band-pass filtering the input signal first thing, as I guess the signals from bats are essentially ultrasounds? So if you're only letting ultrasounds pass before your noise gate, there's little chance other unwanted noises would get through.

I'd suggest prototyping this in Matlab or whatever you're familiar with before implementing this on DSP, to check whether it's going to fit your requirements.

T3sl4co1l · « **Reply #2 on:** August 25, 2021, 06:28:38 pm »

Yes, noise gating is the first obvious thing. If you don't need whatever's in the quiet parts, there you go.

As for the data, give or take CPU speed I guess, that should be enough RAM at least to do typical things, like zlib or the like. Which is probably some FLAC codec in a nutshell, so you could take a look at that.

Sure, RLE is doable; though, if the input is changing as often as the source implies, that might only make things worse (beyond the implicit RLE of not transmitting gated noise). Spectral analysis would be needed to refine that (i.e., since the audible baseband isn't being used, downconvert and resample), or more involved methods (you could always use something like MP3, but with the quality factor cranked way up so it's, if not lossless, then more nearly so).

And other things like mu-law or reducing bits, but I guess that's out given the desire for lossless.

Tim

mariush · « **Reply #3 on:** August 25, 2021, 07:20:11 pm »

FLAC encoder is open source, free, and you can probably cut out some things or hardcode some things to reduce memory usage.

Could always use some super basic compression, like some LZ77 variant, for example something super simple fresh in my mind is the PALMDOC compression used on MOBI ebooks that ran on Palm devices, which is designed to work with 4 KB chunks .. Here's the decoding process, from which you can easily figure out the compression

Quote

Read a byte from the compressed stream. If the byte is:

0x00: "1 literal" copy that byte unmodified to the decompressed stream.
0x09 to 0x7f: "1 literal" copy that byte unmodified to the decompressed stream.
0x01 to 0x08: "literals": the byte is interpreted as a count from 1 to 8, and that many literals are copied unmodified from the compressed stream to the decompressed stream.
0x80 to 0xbf: "length, distance" pair: the 2 leftmost bits of this byte ('10') are discarded, and the following 6 bits are combined with the 8 bits of the next byte to make a 14 bit "distance, length" item. Those 14 bits are broken into 11 bits of distance backwards from the current location in the uncompressed text, and 3 bits of length to copy from that point (copying n+3 bytes, 3 to 10 bytes).
0xc0 to 0xff: "byte pair": this byte is decoded into 2 characters: a space character, and a letter formed from this byte XORed with 0x80.

Repeat from the beginning until there is no more bytes in the compressed file.

PalmDOC data is always divided into 4096 byte blocks and the blocks are acted upon independently.

Your compressed stream could have a simple header like a 2-4 byte signature, 2 bytes (original decompressed size) , 2 bytes (compressed size) , followed by the compressed bytes.
The actual audio samples won't get much compression, I'd say maybe down to 70-80% of original size would be nice.

A chunk of silence would only use around 10-20 bytes (the header + a few bytes to encode up to 2048 x null 16 bit samples) - but you could modify to use a bigger buffer than 4096 bytes, which was a limitation of the palm devices. Or you could have a special flag in the packet header saying this packet has only null samples.

So a super simple decompressor could read your compressed stream and recreate the raw pcm sound or generate a wav file easily.

RoGeorge · « **Reply #4 on:** August 26, 2021, 12:03:09 am »

That's a very good idea to try a DIY compression, thank you.

Might be easier than learning how to port somebody else's library into an embedded DSP. Will look for FLAC sources, too.

SL4P · « **Reply #5 on:** August 26, 2021, 12:33:21 am »

For a sparse, simple waveform, I’d agree with the last comment…
The simplest compression may well be the best in this case.
Clamp off the noise, then try RLE (Run Length Encoding)
Incredibly easy to encode and decode, fast and lossless.

T3sl4co1l · « **Reply #6 on:** August 26, 2021, 12:15:37 pm »

Another thought on RLE -- this time, not so much that it's bad, but the most likely way in which it could work -- if your 16-bit samples are divided into 8-bit bytes, then we expect the high byte to change less, and less often, than the low byte which is basically noise (uncompressible). It's still more likely the high byte changes between consecutive samples, I'm thinking, but in the case where it doesn't, RLE on every other byte would work. You'd then have to interleave the run with literal low-bytes, and merge them together for the output stream.

Basically, RLE works fantastic on sequences with little high-frequency content -- where values don't often change from point to point -- but you have literally the opposite case here, it's largely high frequency, when it's anything at all.

And, intuitively at least, it's unlikely that any other kind of skip-N pattern would apply -- perhaps the values [nearly] repeat every so and so, perhaps following Hadamard sequences or something -- but there are a LOT of sequences to check for (sequency is equivalent to frequency i.e. you're just doing a crummy DFT), and it's highly unlikely that the signal goes through a perfect (remember, bit-perfect) cycle over any given span -- but that is at least a little more likely if restricted to the upper bits, hence the above note.

Which also means that LZ compression may be difficult -- I bet it could perform a lot better if the high and low bytes are sliced apart, by the same logic as above; but for the typical case, it's unlikely that more than modest length patterns can be found. Again: they must be bit-perfect matches. FLAC seems to do well enough with typical audio (ca. 50%?) so it must not be too bad. But that's also mostly low frequency or impulsive content (i.e., percussion dominates a lot of the HF spectrum so we might expect it to be relatively sparse, while the LF part is more diverse).

This is probably a bit redundant, as I see FLAC is described as "tailored to audio formats", it's not just naive LZ compression -- likely they take advantage of bit-slicing, and also most music is stereo which contains considerable redundancy, and probably both together are where the main savings are found? Stereo is a good point, since your signal is mono you won't have that redundancy to save either.

So, aside from noise blanking, bit slicing and some lucky pattern repeats, there's probably not much else to save here, at least unless there's something much more specific (but also potentially lossy, even if minimally so) about the signal that can be operated on (like, how spectrally pure are these whistles, would a wavelet transform give low error?).

Tim

benbradley · « **Reply #7 on:** August 26, 2021, 04:21:19 pm »

Quote from: T3sl4co1l on August 26, 2021, 12:15:37 pm

Another thought on RLE -- this time, not so much that it's bad, but the most likely way in which it could work -- if your 16-bit samples are divided into 8-bit bytes, then we expect the high byte to change less,

Yes, maybe ... if the no-signal level is right in the middle value of the low byte (AND significant amounts of the signal changes are less than 1/256th of full ADC signal), that should work, but it's as likely to be near a crossover point (say, 32767) so that both low and high bytes change at every zero crossing of the signal. It would be good to have some actual data from the ADC to work with (and develop the algorithms on a desktop machine with something like MATLAB). Maybe the average could be detected and a (short-term calculated) constant 8-bit value could be added to make each sample relative to a zero-signal point in the middle value of the low byte, then send it to the high-byte RLE.

That seems awfully kludgy, but perhaps it sparks some ideas for a more generalized lossless compression algorithm. Perhaps do a block at a time with a code at the start that says "the next 32 samples are N (2 to 16) bits each" and only send N bits (each value relative to the quiescent value) for each sample. The peak signal in that block is an N-bit value from the quiescent point. This would automatically compress the quiet parts, while taking slightly MORE than 100% bandwidth for full-scale signals.

Quote from: T3sl4co1l on August 25, 2021, 06:28:38 pm

And other things like mu-law or reducing bits, but I guess that's out given the desire for lossless.

Tim

I recall from reading many years ago that DVD audio encoding uses ADPCM which is by itself lossy, but the format also includes "error correction bits" that make it lossless (and of course also reduces the amount of compression). I suspect FLAC offers higher compression, as it's a more recent algorithm.

SiliconWizard · « **Reply #8 on:** August 26, 2021, 05:58:57 pm »

I'd suggest sticking to simple approaches as we suggested above. FLAC? Encoding takes significant time and resources - and that's the encoding part that the OP was interested in implementing on a DSP, most likely in real time. FLAC is something I would really consider last, after the simpler approaches have failed meeting the requirements. If you can implement FLAC encoding in real-time on a C5535 for signals sampled at a relatively high sample rate, chime in!

Thinking of another approach you could take on the significant portions of the signal (so outside of silence chunks): assuming the signals of interest are ultrasounds, with likely nothing of interest under a certain frequency - please confirm or not - you could also do a spectral down-shift allowing you to store samples with a much lower sample rate.

Not sure what you're going to do with the recorded signals, but a related approach would be to compute a sliding FFT on a relevant time window with some overlap (something like 50%), and only store the FFT coefficients above a certain frequency bin. You could further eliminate bins that are below a given threshold. You could then reconstruct the spectrogram for analysis, which may (or may not, depending on your exact application) be enough for what you want to analyze.

Just some thoughts to ponder.

RoGeorge · « **Reply #9 on:** August 27, 2021, 05:16:59 am »

Maybe I should have give more details in the OP.

the project is just for the fun of it, out of curiosity, only because in some nights a flying bat can be seen in the front yard: https://www.eevblog.com/forum/projects/want-to-hear-the-bat-flying-in-my-front-yard/msg3642229/#msg3642229
The expected spectrum for bats echolocation is expected to be between 12kHz ... 160 kHz, depending on the species, but in practice a 40-50kHz audio heterodyne seems to be enough to hear them
The choice for the C5535 DSP is only because it happened to be on a DSP devboard bought some years ago, board that already has an audio CODEC with ADC/DAC at max 192kHz/16bits (AIC3204), an onboard headphones amplifier, micro SD card socket, 2 buttons, a small display, etc. Though, it doesn't has any extra RAM other than what's inside the C5535 DSP. The board is called eZdsp, most probably discontinued now
Apart from this board, the only devboard I have with audio in/out would be a DECA board (MAX10 FPGA devboard), has plenty of RAM (512MB DDR3) but harder to use and most of all, more power hungry
The need for compression was because I would like to listen to the live signals, but also recording in the same time
To listen to the live signal, a frequency shift with a heterodyne is not enough, because the echolocation pulses are very short, of only a few milliseconds, so it even if the spectrum is shifted to audio it will still sound like a fast train of clicks, nothing more
OTOH, when the original ultrasonic signals are played 10 to 100 times slower, the echolocation signals become very detailed and very very interesting to hear

Compression would allow uninterrupted recording on the SD card, while at the same time listening to a (live) "slow-mo" audio playback at a speed of 10 to 100 times slower.

T3sl4co1l · « **Reply #10 on:** August 27, 2021, 01:14:35 pm »

I noticed that about the spectrum, there seems to be two kinds of calls: tones (with some harmonics, not many given the range; and they seem to be well enough filtered, no obvious aliasing) and, whatever the wideband stuff is, I assume impulsive or chirping. If it's the case that the impulses have dead time between them, then there could be some place for compression there as well. Otherwise, those passages are very wideband and should compress very little at all. Perhaps three modes might be used: blanking for blanks, some frequency-ey transform (tone or wavelet?), give or take residuals for lossless accuracy; and raw for chirps.

It's not clear from here, how much dead time there really is between chirps. Depends on the waterfall response of the environment too. It might not go to zero between, after all.

Do you really need lossless? It doesn't sound like there's any real need for it, just a general interest in maximum fidelity..?

And again, not that much of any of these are feasible in real time on the platform, just that allowing some error greatly opens up the space of possible methods. I would imagine a few-wavelet transform might be feasible; but not a complete one. I'd probably have to spend some weeks with the platform to see what could be done; so, give or take your own experience on it of course -- we're really only going to be covering generalities here, it'll be quite some time to test all the possibilities.

Tim

RoGeorge · « **Reply #11 on:** August 27, 2021, 03:06:34 pm »

Quote from: T3sl4co1l on August 27, 2021, 01:14:35 pm

It's not clear from here, how much dead time there really is between chirps. Depends on the waterfall response of the environment too. It might not go to zero between, after all.

Do you really need lossless? It doesn't sound like there's any real need for it, just a general interest in maximum fidelity..?

My hope is the toy will be good to record other wildlife, too, e.g. I would be curious to hear live (but in slow-mo) other creatures like mice, or crickets, or maybe to listen at a car engine, etc.

You are correct about the lossless aspect, it is more for "just in case", for a later post-processing on a PC. I don't even know if it will worth the effort of implementing it. This project started only out of curiosity, there are already plenty of commercial or DIY wildlife recorders out there.

Most of them are heterodynes, but real-time speed is too fast even if the spectrum is shifted to audio, it sounds like clicks. OTOH, if you open a raw file and play it at 10-100 times slower it all sounds amazing!

See for yourself on any of these files (no DSP, just slower playback of each and every sample): http://www.avisoft.com/batcalls/index.htm

- right click on any raw file (first column) and open it in Audacium (I prefer Audacium over Audacity or Tenacity because the former were recently in some sort of data harvesting/licensing scandal)
- playback the file in slow-mo using the small playback arrow from the second row (not the main playback button), use the one with a slider for playback speed called "Play at Speed".
- try playback speeds between x0.1 and x0.01

Do this only if you feel very, very brave (and keep the lights on), because some bat calls are totally nightmare-ish!

Don't blame me if you accidentally summon Nosferatu!

(I'm from Romania and lived for a while in Transylvania, so I know how to deal with that)


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: Lossless compresion for a sparse audio stream (Read 3777 times)

RoGeorge

Lossless compresion for a sparse audio stream

SiliconWizard

Re: Lossless compresion for a sparse audio stream

T3sl4co1l

Re: Lossless compresion for a sparse audio stream

mariush

Re: Lossless compresion for a sparse audio stream

RoGeorge

Re: Lossless compresion for a sparse audio stream

SL4P

Re: Lossless compresion for a sparse audio stream

T3sl4co1l

Re: Lossless compresion for a sparse audio stream

benbradley

Re: Lossless compresion for a sparse audio stream

SiliconWizard

Re: Lossless compresion for a sparse audio stream

RoGeorge

Re: Lossless compresion for a sparse audio stream

T3sl4co1l

Re: Lossless compresion for a sparse audio stream

RoGeorge

Re: Lossless compresion for a sparse audio stream

Share me