Author Topic: Halving the range of audio samples. (Read 4112 times)

hamster_nz · « **on:** November 23, 2016, 05:58:05 pm »

I want to convert 16-bit audio samples into 15-bit audio samples. First guess is

 int16_t sample_out, sample_in;
 ....
 sample_out = sample_in/2;

Is this the correct way to do this? I'm sure your first reaction will be "yep! that is the way".

Now here's why I ask... negatives round towards zero, and positive values round towards zero - in opposite directions. So -1, 0 and 1 all map to 0 on the output, and in doing so introducing a small amount of crossover distortion.

Should I be doing this way:

Code: [Select]

  int16_t sample_out, sample_in;
  uint16_temp;
  ....
   /* sample_in is from -32768 to 32767 */

   /* Make temp an unsigned value with a range of 0 to 65535 */
   temp = sample_in+0x8000;
   /* Make it 0 to 32767 */
   temp = temp/2;
   /* Make it back to signed, from -16384 to 16383 */
   sample_out = temp - 0x4000;
}

... so negative values are rounded down (away from zero), ensuring that only two input values map to each output value (but adding a small DC offset).

I'm most likely over-thinking this.

bktemp · « **Reply #1 on:** November 23, 2016, 06:06:16 pm »

Code: [Select]

sample_out = sample_in>>1;

This gives -1 -> -1, 0 -> 0 and 1 -> 1, so introduces no distortion.

helius · « **Reply #2 on:** November 23, 2016, 06:36:40 pm »

Quote from: bktemp on November 23, 2016, 06:06:16 pm

Code: [Select]
sample_out = sample_in>>1;This gives -1 -> -1, 0 -> 0 and 1 -> 1, so introduces no distortion.

I think you meant to say 1 -> 0. But it is obvious that simply dropping a bit keeps the range evenly distributed.
Unfortunately, two's-complement range is not symmetric around zero.

hamster_nz · « **Reply #3 on:** November 23, 2016, 06:42:32 pm »

Quote from: bktemp on November 23, 2016, 06:06:16 pm

Code: [Select]
sample_out = sample_in>>1;This gives -1 -> -1, 0 -> 0 and 1 -> 1, so introduces no distortion.

Yep, that works!

So signed integer division does add distortion?

Humm. I might have to do some investigations into the casting of floats and doubles to ints... IIRC these round towards zero too.

(Sorry about the rambling. My subconscious must be working on something I am not aware off...)

bktemp · « **Reply #4 on:** November 23, 2016, 06:47:17 pm »

Quote from: helius on November 23, 2016, 06:36:40 pm

Quote from: bktemp on November 23, 2016, 06:06:16 pm
Code: [Select]
sample_out = sample_in>>1;This gives -1 -> -1, 0 -> 0 and 1 -> 1, so introduces no distortion.
I think you meant to say 1 -> 0. But it is obvious that simply dropping a bit keeps the range evenly distributed.
Unfortunately, two's-complement range is not symmetric around zero.

Right, 1->0.
You can avoid the generated rounding / offset errors by using dithering / noise shaping.

Kalvin · « **Reply #5 on:** November 23, 2016, 07:02:46 pm »

Quote from: bktemp on November 23, 2016, 06:47:17 pm

Quote from: helius on November 23, 2016, 06:36:40 pm
Quote from: bktemp on November 23, 2016, 06:06:16 pm
Code: [Select]
sample_out = sample_in>>1;This gives -1 -> -1, 0 -> 0 and 1 -> 1, so introduces no distortion.
I think you meant to say 1 -> 0. But it is obvious that simply dropping a bit keeps the range evenly distributed.
Unfortunately, two's-complement range is not symmetric around zero.
Right, 1->0.
You can avoid the generated rounding / offset errors by using dithering / noise shaping.

Should the dithering be applied before the division, and should the dithering amplitude be either 1 LSB ie. 0 | 1, or +1 | -1, or rather +1, 0, -1?

whollender · « **Reply #6 on:** November 23, 2016, 07:14:30 pm »

You have to dither before doing the integer division, or you end up adding more noise than necessary.

I think the optimum range is +/- 0.5 LSB ( so one LSB before division) with a triangular probability distribution, but I can't remember where I got that from. I'll have to search to see if I can find a link.

Edit: Thinking about this more from an integer math perspective (I'm more used to it from an ADC perspective), the goal is just to make sure that the values that get rounded have a fifty-fifty chance of rounding up vs down. I think you can achieve this by either adding or subtracting one to/from random samples depending on the rounding direction so that even numbers always round to themselves, but odds can round either way.

bktemp · « **Reply #7 on:** November 23, 2016, 07:29:54 pm »

Dithering must be applied before divising/truncation, otherwise the information is already lost. You need to add enough "noise" to barely reach the next step when the truncated LSBs are zero. If you add more dithering, it adds noise to the signal but does not improve the rounding errors.
The simplest dithering would be adding the LSB to an error accumulator:

Code: [Select]

signed short sample_out, sample_in;
static unsigned char erracc=0;
....
sample_out = sample_in>>1;
erracc+=sample_in&1;
if (erracc>=2)
{
    sample_out++;
    erracc=0;
}

This code preserves the avarage value of the signal.
There are many other, more advanced ways of dithering a signal by trying to distribute the noise over a wider frequency range or moving the noise towards higher frequencies.

snarkysparky · « **Reply #8 on:** November 23, 2016, 08:05:24 pm »

If the divisor is a signed quantity then the compiler should take care of it for you using.

sample_out = sample_in/2;

hamster_nz · « **Reply #9 on:** November 23, 2016, 08:26:51 pm »

Hey, thanks all for the input.

So here is a more fuller description of what I am thinking about.

I am involved with a project that requires audio monitoring of birdsong. We are building up a library of recordings of ambient noise, and will provide an interface to extract selected parts of very long records for detailed analysis.

I want to replace the LSB of audio samples in with metadata about the stream - time, date, GPS location, sample rates, recording device, channel, recording gain and so on. This way the metadata can be included in standard lossless audio file formats (like WAV files), and it can be played with standard media players (of course anything that manipulates the samples will cause the loss of this data, but that isn't my problem).

Eventually this process won't actually involve much data loss as we will be capturing using a 24-bit audio codec, and bit-shifting to keep the 15 bits of useful data. The bit-shift factor will then be included in the block metadata allowing the original dynamic range to be restored. (Of course for playback by standard media players all the blocks in the same file will need to use the same bit-shift factor)

The main reason I don't want to use separate metadata file or embedded metadata blocks in the file is that it needs to have a stream of location, front-end gain, and a reference clock signal (e.g. UTC time) all of which change over time, and we need to be able to randomly access sections of the recordings.

The metadata is a 256-bit block (including a CRC), which is scrambled using the bit stream from a 16-bit LFSR. To allow the state state of the LFSR to be recovered the first 16 bits of the block are zeros. Basically it looks like a single bit of random noise. If you have 511 samples, you can then sync up with the LFSR, and find out when, where and who and how the samples were obtained.

I am now pondering how to merge this data in with the audio samples. One option is to just replace the least significant bit of the sample with the metadata bit

Code: [Select]

out[i] = (in[i] & 0xFFFE) | metadata_bit[i]

The other option is to flip those LSBs that don't match what I need them to be:

Code: [Select]

out[i] = in[i];
if((out[i]&1) != metadata_bit) {
   out[i] &= 0x1;
}

Which is the same as just replacing the LSB

I'm now thinking I might treat it as a dither:

Code: [Select]

dither = 1;
...
out[i] = in[i];
if(out[i]&1 != metadata_bit) {
   out[i] += dither;
   dither = -dither;
}

Hum... so many options. The last one sort of feels right.

I'm trying to work out which makes the most sense before I commit to anything.

AndyC_772 · « **Reply #10 on:** November 23, 2016, 08:36:19 pm »

Quote from: hamster_nz on November 23, 2016, 06:42:32 pm

Yep, that works!

So signed integer division does add distortion?

Looks like it does, thanks for the insight. I don't think I'd fully appreciated that a shift vs a divide by two would result in different answers, but they definitely do:

Code: [Select]

-10 /2 = -5, >> 1 = -5
-9 /2 = -4, >> 1 = -5
-8 /2 = -4, >> 1 = -4
-7 /2 = -3, >> 1 = -4
-6 /2 = -3, >> 1 = -3
-5 /2 = -2, >> 1 = -3
-4 /2 = -2, >> 1 = -2
-3 /2 = -1, >> 1 = -2
-2 /2 = -1, >> 1 = -1
-1 /2 = 0, >> 1 = -1
0 /2 = 0, >> 1 = 0
1 /2 = 0, >> 1 = 0
2 /2 = 1, >> 1 = 1
3 /2 = 1, >> 1 = 1
4 /2 = 2, >> 1 = 2
5 /2 = 2, >> 1 = 2
6 /2 = 3, >> 1 = 3
7 /2 = 3, >> 1 = 3
8 /2 = 4, >> 1 = 4
9 /2 = 4, >> 1 = 4

hamster_nz · « **Reply #11 on:** November 23, 2016, 08:57:02 pm »

Quote from: AndyC_772 on November 23, 2016, 08:36:19 pm

Quote from: hamster_nz on November 23, 2016, 06:42:32 pm
Yep, that works!

So signed integer division does add distortion?

Looks like it does, thanks for the insight. I don't think I'd fully appreciated that a shift vs a divide by two would result in different answers, but they definitely do:

Phew - I'm not going mad..

whollender · « **Reply #12 on:** November 23, 2016, 09:23:03 pm »

Quote from: bktemp on November 23, 2016, 07:29:54 pm

Dithering must be applied before divising/truncation, otherwise the information is already lost. You need to add enough "noise" to barely reach the next step when the truncated LSBs are zero. If you add more dithering, it adds noise to the signal but does not improve the rounding errors.
The simplest dithering would be adding the LSB to an error accumulator:
Code: [Select]
signed short sample_out, sample_in; static unsigned char erracc=0; .... sample_out = sample_in>>1; erracc+=sample_in&1; if (erracc>=2) { sample_out++; erracc=0; }This code preserves the avarage value of the signal.
There are many other, more advanced ways of dithering a signal by trying to distribute the noise over a wider frequency range or moving the noise towards higher frequencies.

One downside to this approach is that you are implicitly noise shaping the output (towards higher frequencies), which is why it's often used in downsampling converters. If the output isn't being downsampled, this will result in a higher pitched noise spectrum, which may or may not be a problem. If you have a random source available, it's better to use it so that you have a flat output spectrum, but this works well if you don't have one available.

Quote from: hamster_nz on November 23, 2016, 08:26:51 pm

I'm now thinking I might treat it as a dither:

Code: [Select]
dither = 1; ... out[i] = in[i]; if(out[i]&1 != metadata_bit) { out[i] += dither; dither = -dither; }

I'm having trouble seeing how you would extract your metadata in this case. How can you tell that a bit has been flipped in the output stream?

I'd suggest dithering your input data using your 'random' number generator (ie, LFSR output), truncating to 15 bits, then tacking on the LFSR output to the LSB. I've been wracking my brain trying to figure out if there's a better way to do this than the direct way, but I can't think of one (they all seem to require a bunch of if/else statements):

Code: [Select]

out[i] = ((in[i] + lfsr[i]) & 0xFFFE) + lfsr[i];

amirm · « **Reply #13 on:** November 23, 2016, 09:39:42 pm »

You might want to look at audio watermarking algorithms. They use data hiding techniques (i.e. determining when the changed audio samples are not audible) to insert bits across many samples. You create a channel, insert the data and then add error correction to it.

That said, if the content is bird songs, then you don't need something this complex or even dithering.

hamster_nz · « **Reply #14 on:** November 23, 2016, 10:33:54 pm »

Quote from: whollender on November 23, 2016, 09:23:03 pm

Quote from: hamster_nz on November 23, 2016, 08:26:51 pm
I'm now thinking I might treat it as a dither:

Code: [Select]
dither = 1; ... out[i] = in[i]; if(out[i]&1 != metadata_bit[i]) { out[i] += dither; dither = -dither; }

I'm having trouble seeing how you would extract your metadata in this case. How can you tell that a bit has been flipped in the output stream?

The metadata_bits are the raw metadata XORed with the pseudo-random bitstream from the LFSR, before being inserted into the samples. With the above code, the LSB should be the scrambled metadata.

So to extract the metadata the process isn't too hard.

- Take a guess where the block begins - Most likely is sample 0 is the first sample 0 of a block data block

- extract the 256 LSBs of the samples.

- load the first 16 bits into the LFSR

- generate the 256 bits of the pseudo-random bit stream.

- XOR the 256 pseudo-random bits that with the 256 LSBs to recover the 256 bits of unscrambled metadata.

- Check the 16-bit version number is valid

- Check the CRC-8 held in the last 8 bits of the block

If the CRC passes and the 16-bit version number field is valid, then the likelyhood is that you have a valid block of metadata - (about 1 in 2^24).

If the block doesn't validate, try the next 255 possible alignments. If none of them work out then the audio samples do not have the embedded metadata, or there is an data error in the stream.

snarkysparky · « **Reply #15 on:** November 23, 2016, 11:40:36 pm »

It doesn't add distortion if the rounding for signed and unsigned is toward zero. Right ??

Gcc via the Atmel Studio 7 rounds toward zero using shifts when dividing by a power of two

Keep it simple

hamster_nz · « **Reply #16 on:** November 24, 2016, 12:32:32 am »

Quote from: snarkysparky on November 23, 2016, 11:40:36 pm

It doesn't add distortion if the rounding for signed and unsigned is toward zero. Right ??

Gcc via the Atmel Studio 7 rounds toward zero using shifts when dividing by a power of two

Keep it simple

Sweeping theory under the rug for a few moments...

Take the series -5, -3, -1, 1, 3, 5 repeated a few times.

Divide by two rounding towards zero gives... -2 -1, 0, 0, 1, 2, repeated.

Divide by two rounding down gives: -3, -2, -1, 0, 1, 2 repeated.

The difference from the ideal for round to zero is -0.5, -0.5, -0.5, 0.5, 0.5, 0.5 repeated - a square wave

The difference from the ideal for round down is -0.5, -0.5, -0.5, -0.5, -0.5, -0.5 repeated - a DC offset.

It then becomes a question of what is important to your application? The DC offset, or the introduced harmonic errors when you cross zero? For audio I'll take the DC offset.

Quote

Gcc via the Atmel Studio 7 rounds toward zero using shifts when dividing by a power of two

Are you sure this is true? Can you give an example? For signed values it might add the value of the sign bit before bit-shifting right.

This is the code to prove that (x>>1) != (x/2) :

Code: [Select]

C:\Users\hamster\tcc>type test.c
#include <stdio.h>

int main(void) {
  printf("%i vs %i\n", -3>>1, -3/2);
}
C:\Users\hamster\tcc>tcc test.c
C:\Users\hamster\tcc>test
-2 vs -1

snarkysparky · « **Reply #17 on:** November 24, 2016, 01:34:50 pm »

I see what you mean. Most of the time one would want the error to be zero mean value so that it wouldn't show up in further calculations that might be integrating the value. For audio without additional processing the less distortion may be OK.

In atmel studio the assembler output for a signed int divide by power of two generated code to implement this

output = (x<0 ? x+(1<<k)-1 : x) >> k

I checked because I want to do a lot of math using shifts instead of calling the snail slow ldiv routine.

whollender · « **Reply #18 on:** November 24, 2016, 05:10:00 pm »

Quote from: hamster_nz on November 23, 2016, 10:33:54 pm

The metadata_bits are the raw metadata XORed with the pseudo-random bitstream from the LFSR, before being inserted into the samples. With the above code, the LSB should be the scrambled metadata.

I read the if statement as equal to instead of not equal, and managed to confused myself

Anyway, I think that looks pretty good, but I think adding and subtracting is adding more dither than actually necessary. You should be able to get away with just one or the other, because all you're really trying to do is randomly force the rounding process one way or the other, and not adding anything implicitly rounds one direction already when you truncate.

Your application probably doesn't need the absolute maximum performance possible, so it's probably not worth spending too much time on finding the absolute best method for 16 -> 15 bits, but it is a fun theoretical exercise


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Halving the range of audio samples. (Read 4112 times)

Share me